The Daily Perth

Perth news, every day

News

Perth's Duplicate Image Problem: The Numbers Exposing a Hidden Crisis in Local Digital Archives

Thousands of duplicated images are clogging government, council and media databases across Perth — and the cost of cleaning them up is measurable, if still largely ignored.

By Perth News Desk · Published 5 July 2026, 5:51 am

3 min read

UpdatedUpdated 5 July 2026, 1:45 pm

#News
Perth's Duplicate Image Problem: The Numbers Exposing a Hidden Crisis in Local Digital Archives
Photo: Photo by Harrison Reilly on Pexels

Advertisement

Western Australia's public sector is sitting on an estimated tens of thousands of duplicate digital images across its agency databases, a problem that has quietly inflated storage costs, slowed content workflows and undermined public records integrity at organisations stretching from the City of Perth's King Street offices to the State Records Office in Alexander Drive, Midland.

The issue has sharpened this year as several WA government departments migrate legacy content into new cloud-based asset management systems, a process required under the state's Digital Strategy 2025–2028. When agencies move old file servers into centralised repositories, duplicate images — the same photograph catalogued under two file names, three timestamps or four department subfolders — surface in bulk. Migration teams at some departments have reportedly found duplication rates above 30 percent in unstructured image libraries, according to general findings published by the Australian Digital Alliance in its 2025 records management review, though specific agency figures have not been made public.

What the Data Actually Shows

The scale of duplication in digital image libraries is not unique to Perth, but WA's rapid infrastructure expansion has made it acute. Metronet alone has generated thousands of site-progress photographs since construction began on the Morley-Ellenbrook Line — images captured by contractors, engineers, communications teams and drone operators, frequently uploaded to multiple platforms without deduplication protocols in place. The project spans more than 21 kilometres of new rail corridor and involves at least a dozen separate contractor organisations, each managing their own documentation systems.

Advertisement

In practical terms, storing a duplicate image costs the same as storing the original. Cloud storage on enterprise platforms used by WA government agencies typically runs between $0.02 and $0.05 per gigabyte per month, depending on redundancy tier. A library of 500,000 images — not unreasonable for a major infrastructure agency across a five-year build — can consume several terabytes. If 30 percent of those files are duplicates, an agency is effectively paying for storage it gains nothing from, month after month.

The City of Stirling, which manages one of Perth's largest municipal communications archives, has been piloting an automated deduplication tool since March 2026 as part of a broader digital asset management overhaul. The program runs hash-matching algorithms — essentially digital fingerprints — against the council's entire image repository. Early internal benchmarks, referenced in the council's March 2026 ordinary meeting agenda, suggested the tool identified duplicates at a rate that could reduce active storage volume by roughly a quarter, though the council has not published final figures.

Why It Matters Beyond Storage Bills

The financial cost is real but secondary. The more significant problem is what duplicate images do to public records. Under the State Records Act 2000, WA agencies have obligations around the authenticity and accessibility of their records. When the same image exists in four locations under different metadata — different dates, different descriptions, different access permissions — it creates ambiguity about which version is the authoritative record. That matters in an AUKUS environment, where defence-related imagery associated with projects at HMAS Stirling on Garden Island is subject to additional classification and record-keeping scrutiny.

Perth-based digital archiving firm Datasphere Group, which works with local government and resources sector clients in the CBD and Osborne Park, has described the deduplication process as a necessary precondition before any serious AI-assisted content search can be deployed. Duplicate images confuse training datasets, inflate search result counts and produce false matches when image recognition tools are applied to large libraries.

For agencies and organisations beginning or continuing digital migration this financial year, the practical advice from records management practitioners is consistent: run a deduplication audit before migrating, not after. Cleaning a 200,000-image archive before upload takes days; cleaning it after it has been ingested, indexed and cross-referenced across multiple systems can take months. The City of Perth's digital asset team, based at its Hay Street administrative centre, has scheduled a full repository audit for the September 2026 quarter — one of the first metropolitan councils to formalise the process on the public record calendar.

The numbers are not glamorous, but they add up fast.

Advertisement

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Perth

This article was produced by the The Daily Perth editorial desk and covers news in Perth. See our editorial standards for how we use AI.

Stay in the loop

Enjoyed this story? Get tomorrow's briefing free.

Daily brief

Enjoyed this? Wake up to Perth news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Perth and accept our Privacy Policy. Unsubscribe anytime.

The Daily Network — local news across Australia

More local news across Australia