Istanbul's Digital Archivists Are Drowning in Duplicate Images — and the Numbers Reveal Why
A quiet crisis in how the city's institutions store and catalogue visual records is costing organisations time, money, and irreplaceable historical data.
A quiet crisis in how the city's institutions store and catalogue visual records is costing organisations time, money, and irreplaceable historical data.

At least one in three digital images stored across Istanbul's major municipal and heritage institutions is a duplicate, according to internal audits reviewed by archivists working within the city's cultural sector. The problem is not new, but the scale is becoming impossible to ignore as storage costs climb alongside the Turkish lira's continued weakness against the dollar.
The issue matters urgently right now because Istanbul Metropolitan Municipality's digital transformation drive — part of a broader smart-city initiative that accelerated after the 2019 local elections — has pushed dozens of departments to digitise physical records at speed. That haste has consequences. When scanning teams work in parallel across different buildings, the same document, photograph, or architectural drawing routinely ends up saved under multiple filenames, in multiple folders, on multiple servers. The metadata rarely matches. Finding the authoritative copy becomes a research project in itself.
Cloud storage is not cheap when you are paying in lira. Enterprise-grade object storage from major providers costs roughly $0.02 to $0.025 per gigabyte per month at mid-2026 rates. For an institution holding, say, 200 terabytes of unaudited image data — a realistic figure for a large municipal archive — that translates to a monthly bill that can swing by tens of thousands of lira depending on the exchange rate on any given day. Duplicates, by conservative industry estimates cited in digital preservation literature, typically account for between 30 and 40 percent of total image storage volume in organisations that lack automated deduplication workflows.
Two Istanbul institutions have begun confronting this directly. The İstanbul Araştırmaları Enstitüsü (Istanbul Research Institute), based on İstiklal Caddesi in Beyoğlu, launched an internal deduplication audit of its photographic holdings in the first quarter of 2026. Separately, the Atatürk Kitaplığı — the city's main public library on Millet Caddesi in Fatih — is understood to be evaluating software tools to identify redundant files across its digital newspaper and photograph collections, some of which date to the late Ottoman period and carry significant heritage value. Neither institution provided official comment for this article.
The numbers inside individual projects tell the same story. A digitisation project covering historic Bosphorus waterfront yalıs — the timber mansions that line the strait from Beşiktaş to Sarıyer — reportedly generated upward of 40,000 raw image files over eighteen months of fieldwork. Preliminary checks found that roughly 12,000 of those files were near-identical duplicates created when photographers bracketed exposures and field teams failed to cull images before upload. At an average uncompressed file size of 25 megabytes, those redundant files alone consumed approximately 300 gigabytes of primary storage, plus backup copies.
The technical fix — running a hash-based or perceptual-hash deduplication algorithm across a file system — is well understood. The organisational problem is not. Archivists must decide which copy is canonical before they can delete anything. For heritage images, that decision requires human judgment: a photograph taken a half-second earlier or later, at a marginally different angle, may capture detail the other does not. Deleting the wrong file is permanent. That caution, entirely reasonable, means the work moves slowly and expensively.
Several European city archives, including those in Amsterdam and Vienna, have published open-source toolkits for managing exactly this problem over the past three years. Istanbul's institutions have largely not adopted them, partly because documentation is in English or German and partly because procurement cycles within municipal structures can stretch eighteen months or longer.
The practical path forward involves three steps that archivists and information scientists broadly agree on: first, freeze new uploads to unaudited legacy folders while the audit runs; second, deploy perceptual hashing tools — several are available without licensing costs — to flag near-duplicates for human review; third, establish a single naming convention enforced at the point of ingest, not retroactively. Organisations that have completed this sequence report storage savings of 25 to 35 percent within twelve months. For Istanbul's archives, that could mean meaningful budget relief at a moment when every lira saved in one department is a lira available somewhere else.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Istanbul
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News