Istanbul's municipal and heritage databases contain an estimated 40 to 60 percent rate of duplicate image files across their digitised collections, according to a technical assessment circulated among IT departments at the Istanbul Metropolitan Municipality's data directorate earlier this year. That single figure, quietly noted in an internal working document reviewed by The Daily Istanbul, explains why several high-profile digitisation projects have run over budget and behind schedule since 2024.
The problem is not trivial. Storage costs for redundant files accumulate fast at scale. When a city the size of Istanbul — home to more than 15 million residents and receiving roughly 20 million tourists annually — digitises decades of planning permits, Bosphorus development filings, and cultural heritage records, bloated archives slow retrieval times, inflate cloud licensing fees, and make the underlying data almost unusable for researchers or urban planners.
Where the Numbers Come From
The Istanbul Metropolitan Municipality launched its AKOS digital asset management initiative in late 2023, partly in response to the administrative chaos that followed the Kahramanmaraş earthquakes earlier that year, when physical records in several Hatay and Adıyaman district offices were destroyed or rendered inaccessible. The goal was to ensure Istanbul's own urban-planning records could survive a comparable disaster by migrating them to redundant cloud infrastructure. By the first quarter of 2025, the project had ingested roughly 2.3 million image files covering everything from Fatih district renovation permits to satellite imagery of the Tuzla coastal development zone.
Technical staff running the migration discovered the duplication rate only after deploying hash-comparison software across the first 500,000-file batch. The results showed 44 percent of those images were exact or near-exact copies, many uploaded multiple times by different departments with no cross-referencing system in place. Eliminating those duplicates from the first batch alone freed approximately 1.8 terabytes of storage — a figure that sounds abstract until you price Istanbul's contracted cloud storage rate, which municipal procurement records from 2024 show was negotiated at roughly 0.023 US dollars per gigabyte per month. The maths is straightforward: 1,800 gigabytes multiplied across a 12-month contract equals a not-insignificant recurring line item that was simply being wasted.
The Istanbul Archaeological Museums on Osman Hamdi Bey Yokuşu in Sultanahmet faces a parallel version of this problem. The museums' digitisation partnership with the Koç University ANAMED research centre, which began archiving artefact photography in 2022, identified duplicate submission rates of over 35 percent in the first phase of its collection database. Staff there have been manually tagging and retiring redundant files, a process that as of this spring was estimated to take another 18 months to complete even with automated tooling.
Why Istanbul's Institutions Struggle to Fix This
Part of the structural explanation is organisational. Istanbul's municipal apparatus spans dozens of semi-autonomous subsidiaries — İSKİ for water, İGDAŞ for gas, the various district municipalities — each maintaining separate digital infrastructure with minimal interoperability. When the Beyoğlu district municipality digitised its Istiklal Caddesi business-licence records in 2024, it had no mechanism to check whether the Metropolitan Municipality's own planning archive already held overlapping imagery from the same addresses. The result was straightforward: duplication baked in from day one.
The Syrian refugee integration programmes administered through the Sultanbeyli and Bağcılar district offices have added another layer of complexity. Document-scanning initiatives designed to register informal housing arrangements and residency files since 2022 have produced their own large batches of scanned photographs, often filed by both the district office and a central humanitarian coordination body, with no deduplication step in the workflow.
The practical path forward, as outlined in proposals before the Metropolitan Municipality's technology committee, involves deploying perceptual hashing — a method that catches near-duplicate images even when file names differ — across the full 2.3 million-file archive before any further migration phases begin. The committee was scheduled to vote on a procurement tender for the relevant software in June 2026. Institutions sitting on large local image collections should, in the meantime, treat any storage expansion request as a signal to audit for duplicates first. The numbers suggest the redundancy is almost certainly already there.