Istanbul Municipality's digital archive division confirmed this week that a structured duplicate-image replacement programme is formally moving into its operational phase, targeting an estimated 340,000 redundant file entries spread across the city's centralised heritage and urban-planning image repositories. The problem did not arrive overnight. It accumulated across more than 20 years of overlapping digitisation campaigns, each launched by a different administration with different software standards and no shared protocol for deduplication.
The timing matters because the archive serves a practical function beyond history. Planners drawing on the IBB — Istanbul Büyükşehir Belediyesi — image database use it to assess building stock in high-risk zones ahead of urban renewal projects. With Istanbul still processing the lessons of the February 2023 Kahramanmaraş earthquake, accurate and non-duplicated photographic records of neighbourhood-level construction are no longer a bureaucratic nicety. They are an operational requirement.
Three Digitisation Waves, Three Sets of Orphaned Files
The roots of the duplication problem trace to the early 2000s, when the Istanbul Metropolitan Municipality ran its first large-scale scan of Ottoman-era planning documents held at the Atatürk Library in Beyoğlu. That project produced roughly 80,000 image files in TIFF format. A second wave came between 2009 and 2012, when a UNESCO-affiliated heritage project covering the Historic Peninsula — the area bounded by the old Theodosian Walls to the west and the Bosphorus coastline — rescanned many of the same documents in higher resolution JPEG format for web access. Neither project cross-referenced the other's output.
The third and most disruptive wave followed the 2023 earthquake response. District municipalities in Kadıköy, Üsküdar, and Fatih each launched independent photographic surveys of residential buildings to feed into IBB's earthquake risk mapping system. Contractors used at least four different image-management platforms, and file-naming conventions varied by district. By late 2024, internal audits found that some building facade photographs appeared in the central system as many as seven times under different filenames, each iteration tagged to a different project code.
The Fatih district survey alone — covering approximately 12,000 registered structures between Aksaray and Yedikule — generated around 47,000 image submissions, of which technical staff later estimated more than a third were functionally identical to files already held in the central repository.
What the Clean-Up Actually Involves
The current replacement programme, administered through IBB's Geospatial Data Directorate and supported by Boğaziçi University's urban informatics unit, works in two stages. The first is algorithmic: perceptual hashing software compares image fingerprints across the full archive and flags matches above a 96 percent similarity threshold. The second stage is human review, where municipal archivists — eight are currently assigned to the project at the directorate's offices near Saraçhane — assess flagged pairs and either confirm deletion of the lower-quality copy or merge associated metadata before replacement.
The programme formally began in March 2026. As of the end of June, roughly 91,000 duplicate entries had been resolved, according to figures the directorate shared with the IBB planning committee. The full process is scheduled for completion by the first quarter of 2027, at which point the cleaned archive is expected to underpin an updated version of the city's public earthquake-risk map, last revised in November 2024.
For residents and researchers who use the IBB's open data portal — accessible through the municipality's data.ibb.gov.tr platform — the visible change will be a reduction in broken or mismatched image thumbnails that have long plagued the neighbourhood-profile pages for older districts like Balat and Zeyrek. Urban planners applying for permits in zones flagged under Istanbul's 2019 Urban Transformation Law will find that building-history documentation pulls cleaner, single-version records rather than cascading near-identical copies that slow processing. The deduplication effort will not rewrite the archive's history. It will simply make what is already there usable.