Istanbul's two largest municipal digitisation projects are sitting on a problem nobody publicly advertised: tens of thousands of duplicate photographs clogging their servers, slowing public access portals, and inflating storage costs at a time when the city's IT budget is already stretched by post-earthquake infrastructure reviews. The Istanbul Metropolitan Municipality's digital archive unit — which manages the online collections feeding the İBB Kütüphaneleri network — began a formal deduplication audit in March 2026, targeting an estimated 40,000 redundant image files accumulated since 2018.
Why now? The answer is partly practical and partly political. The municipality under Mayor Ekrem İmamoğlu has made open-data access a signature policy, and the public-facing portal İstanbul Arşivi has seen registered users grow steadily since 2022. Duplicate entries — the same Galata Tower photograph filed under three different catalogue numbers, for example — erode trust in the archive and waste staff hours on manual triage. With heritage documentation accelerating after the 2023 Kahramanmaraş earthquakes triggered fresh anxiety about structural loss across Türkiye, getting the visual record right has taken on sharper urgency.
What Istanbul Is Actually Doing
The deduplication push involves two distinct workflows. The first is automated: the municipality contracted with a local software firm based in Maslak's Teknokent technology park to deploy perceptual hashing algorithms — tools that detect visually similar images even when file names, metadata, or resolutions differ. The second is human: a review team at the Atatürk Kitaplığı on Cumhuriyet Caddesi in Beyoğlu is manually assessing any image pair flagged as a probable duplicate before deletion is authorised. Nothing is erased without a librarian sign-off. The archive holds scanned materials going back to the late Ottoman period, so the margin for error is low.
The Fatih Sultan Mehmet Vakıf Üniversitesi's digital humanities department has been brought in as an academic oversight body, checking methodology against international museum standards. The university's involvement matters: it gives the process a layer of independent validation that the municipality, in its current politically charged relationship with the central government, can point to if the methodology is challenged.
Storage costs are real. Commercial cloud pricing for cultural institutions in Türkiye has risen alongside the lira's depreciation — local IT procurement managers have described annual storage contracts jumping by more than 60 percent in lira terms between 2023 and 2025, though exact municipal figures are not publicly available. Cutting redundant files directly reduces those costs.
How Istanbul Compares With Other Cities
Istanbul is not alone, and it is not the fastest. Amsterdam's Stadsarchief completed a deduplication project across its 750,000-image collection in 2024, publishing its open-source hashing methodology on GitHub and prompting several other European city archives to adopt similar pipelines. Barcelona's Arxiu Municipal ran a comparable exercise in 2023, cutting its digitised photograph holdings by roughly 18 percent after removing duplicates and near-duplicates — a figure the archive cited in its annual report. London's Wellcome Collection has built perceptual hashing into its ongoing ingest workflow so duplicates are caught at the point of upload rather than discovered years later.
Istanbul's approach is closer to the London model in ambition — building a permanent check rather than a one-time cleanup — but its execution timeline is slower. The Maslak-based software contract runs through December 2026, meaning the bulk of the automated review will not be complete until the end of the year. Amsterdam finished a comparable volume in roughly eight months.
The gap matters for anyone relying on İstanbul Arşivi for research. Scholars working on the historical fabric of Karaköy or Balat, neighbourhoods whose streetscapes changed dramatically in the twentieth century, routinely hit duplicate catalogue entries that send them down dead ends. Once the audit is done, the municipality says search results on the public portal will be consolidated — one definitive record per image, with all known variant files cross-referenced rather than listed as separate entries.
For residents and researchers, the practical advice is straightforward: if you are citing images from İstanbul Arşivi in academic or journalistic work right now, note the catalogue number and check it again after January 2027, when the consolidated database is scheduled to go live. Records may be renumbered as part of the cleanup, and early citations could become orphaned links if the old identifiers are retired without a redirect system — something the Atatürk Kitaplığı team is still working to resolve before the switch is flipped.