Siloed development and inconsistent data standards create integration barriers at scale, making fed tech interoperability as much a governance and cultural challenge as a technical one
by Intelliworx
One of the biggest challenges in modernizing its technology is integration. GAO reports have cited interoperability as a key challenge that has hindered several high-profile technology projects.
Why is integration such a challenge?
“[Government] Systems don’t talk to each other. They weren’t designed with that goal in mind, so it creates a big, big gap in what’s needed right now versus what’s available,” the EPA’s CIO Carter Farmer told FedScoop in a piece exploring the proliferation of technical debt.
Many of the legacy systems the government relies on are decades old. In a sense, systems integration is a bit like trying to plug a modern smartphone into an old landline telephone jack and expecting the contacts to synchronize and the apps to update.
Stay in touch by subscribing to our email newsletter.
We will never share or sell your email address.
Unstructured data, interoperability and tech debt
Legacy systems were built at a time when network connectivity was slow, processing power was resource-intensive, and storage space was expensive. These physical realities governed the design and architecture available at the time.
This contributed to two high-level technical obstacles to modernization that the government wrestles with today: flat files and batch processing. Flat files are unstructured data stored in plain text. Plain text lacks a standardized format and relationships that another system needs to make use of it.
If you’ve ever downloaded a spreadsheet in a .csv format (comma-separated values), you’ve experimented with flat file data. You have to spend some time formatting the spreadsheet before you can start interpreting the data, running formulas, or creating graphs.
Data is stored in legacy systems in the same way. It works within the system it was made for, however, sharing that data with another system is a heavy lift. For example, extracting just the zip code from a plain text address that includes the street number, street name, suite number, city, state, and zip code requires some sort of adaptor, middleware, or an extract-transform-load (ETL) process.
For another system to use the zip code, for example, technologists would need to develop a specialized software script. That script would extract just the zip code from the old system, transform it into a standardized format, and then load it into the new system. A script has to be made for each cohort of data you want to get from an old system and share with a new one.
Each script is another piece of software that has to be updated, secured and maintained. That maintenance has a cost, which technologists refer to as “tech debt.” The federal government maintains ~5,000 systems, each with a mountain of data. Consequently, the government spends 80% of its technology budget maintaining its legacy systems.
(Click for larger image)
Batch processing in off-peak hours
If the flat file explains how data was designed to be stored, batch processing describes how it was designed to be processed. Typically, data was processed in batches of high-volume work, outside of normal working hours, when fewer technology resources were being used by employees.
Batch processing is still used in many cases. That’s why when you file a tax return, the status is updated in steps. The system confirms a tax return was received, but it might take a day before the return is processed and the status is updated as accepted or a refund is pending.
That’s a challenge for integration because many modern systems are designed for instant or “real-time” communication. Getting a real-time system to “talk” to a legacy system that batch processes data requires complicated synchronization or duplicate data sets. It’s yet another part of the tech infrastructure that can break and needs to be maintained.
Siloed development, disruption and sheer scale
Such challenges aren’t unique to the government. Large companies also have legacy systems that have accrued similar tech debt. Private banking, personal investments, and payroll processing often all still run on batch processing. When you pay off your credit card, for example, the payment isn’t reflected until the batch is processed overnight.
What’s different about the federal government is that, historically, each agency set about building technology systems independently. There are more than 400 agencies and subagencies, and many approached the design and architecture of IT systems differently – to meet the unique needs of their mission.
Where a single large company needs to build internal consensus on standards – a difficult task in its own right – the federal government has to get all those agencies, each the size of a large company, on the same page: different missions, stakeholders, cultures, leadership styles, and different infrastructure.
Why can’t AI just fix this?
A common misnomer is that artificial intelligence (AI) will be able to fix these problems. Text has long been viewed as unstructured data, and yet generative AI seems to work well with it. The catch is that generative AI works because the text has been structured into a model – large language models (LLMs).
Generative AI is certainly more complex than autocomplete, which guesses the next word based on statistical probability. However, at a fundamental level, it’s a similar, albeit more powerful, pattern-matching system that can’t work outside the structure of its model.
Data standardization, like all technology, is crucial to effective AI implementation. That’s why there’s so much commentary from technologists cautioning against rushing AI projects before governance and data standards have been addressed. The risk is creating the same challenges with AI, in another 10 or 20 years, that have dogged modernizing legacy systems all over again.
Centralized data standards are key
The federal government is working to streamline technology acquisition, foster shared resources and pool purchasing power to control costs. This is a noble endeavor that helps fulfill the government’s role while being a good steward of taxpayer money.
Yet competing standards still exist and so technical debt continues to accrue. Building consensus is a complicated and monolithic challenge. It’s a technical challenge for sure, but it’s also a cultural one. We don’t pretend to have all the answers, but we’re confident that consensus on data standards and governance is a crucial first step.
* * *
Intelliworx has been providing purpose-built software to the federal government for 20 years and currently serves 40+ federal government agencies. The company is a certified service-disabled veteran-owned small business (SDVOSB) and is FedRAMP-authorized.
Contact us for a no-obligation demo.
If you enjoyed this post, you might also like:
Federal CDO-CIO relationships are driving a data-centric culture
