Perth's public-sector digital archives are sitting on a problem measured in terabytes. Across agencies from the City of Perth's planning portal on Barrack Street to the State Records Office of Western Australia in Alexander Drive, Osborne Park, the accumulation of duplicate images — identical or near-identical files stored under different filenames — has reached a scale that is now registering on IT budget reviews.
The trigger for closer scrutiny is timing. The WA government's 2025–26 budget allocated additional capital to digitise heritage collections and support Metronet station documentation, pushing the volume of imagery flowing into state systems to levels not previously encountered in a single financial year. When intake accelerates, so does duplication — and so does the cost of fixing it.
What the Data Actually Shows
Industry benchmarks from digital asset management research — including studies published by the Chartered Institute of Library and Information Professionals — suggest that between 20 and 40 percent of files in unmanaged image repositories are exact or near-exact duplicates. Apply even the conservative end of that range to a mid-sized government archive holding 500,000 image assets, and you are talking about 100,000 redundant files consuming server capacity, slowing search retrieval times, and creating version-control headaches for staff.
Storage costs vary sharply depending on infrastructure, but enterprise-grade cloud storage in Australia is broadly priced in the range of $25 to $50 per terabyte per month for managed services. A repository bloated by 30 percent unnecessary duplication across, say, 10 terabytes of image data translates to roughly $75 to $150 per month in pure waste — before factoring in the labour hours staff spend manually checking which version of a file is authoritative.
The City of Stirling, which manages one of the largest local government digital records programmes in Western Australia due partly to its proximity to HMAS Stirling infrastructure and associated planning documentation, has been among the councils quietly grappling with this. The volume of site imagery generated through development applications along Wanneroo Road and the Scarborough foreshore redevelopment has compounded the problem since 2023.
Replacement Without a Plan Creates New Problems
The instinct when duplicate images are detected is to delete or overwrite. But rushed replacement without a deduplication protocol can break metadata chains — the indexed links that tell a records system which image belongs to which file, permit, or heritage listing. That breakage is itself a compliance risk under the State Records Act 2000 (WA), which mandates the integrity of official records.
The WA State Records Office updated its digital recordkeeping guidance in 2024, but the practical implementation of image-specific deduplication policy remains uneven across local government authorities. Libraries and archives that have adopted dedicated digital asset management platforms — tools that use perceptual hashing to flag visually similar images even when filenames differ — report significant reductions in manual review time. Perceptual hashing works by generating a compact numerical fingerprint of an image's visual content rather than its file data, meaning two differently compressed versions of the same photograph will register as duplicates even if their file sizes differ.
The Perth-based technology sector, particularly firms clustered around the Technology Park precinct in Bentley, has seen growing demand for this kind of remediation work from both state government and resources companies managing enormous photographic catalogues from mine sites and offshore infrastructure.
For organisations beginning to audit their own holdings, the practical starting point is a baseline count: total image assets, file size distribution, and the date range of ingestion. Running a perceptual hash comparison against any repository over 10,000 files without automated tooling is not a realistic proposition — the manual labour alone would run to weeks. Open-source tools exist, but enterprise environments in WA are increasingly turning to vendors who can integrate deduplication into existing records management platforms rather than running it as a separate process.
The next test for WA's digital recordkeeping infrastructure will come as Metronet station documentation accelerates through 2026 and 2027, with construction photography and planning imagery arriving from Ellenbrook, Morley, and Yanchep simultaneously. Getting deduplication policy settled before that wave arrives is the practical priority — the cost of sorting it afterwards will be considerably higher.