Ataccama Eyes Data Quality Cracks In AI Infrastructure Desert

Posted by Adrian Bridgwater, Senior Contributor | 6 hours ago | /big-data, /cloud, /innovation, Big Data, Cloud, Innovation, standard | Views: 11


AI is smart, but only as smart as we make it. This basic de facto industry truth is understood because we know that the intelligence in articificial intelligence directly correlates to its ability to access, ingest and analyze data patterns. Whatever level of semantic reasoning, pattern recognition and non-deterministic agentic intellect a software system aims to achieve is a factor of what data it is fed with.

But this is not Costco or Walmart; more does not always mean we’re getting a good deal. In the realm of software engineering and data science that services AI, there is also a pressing need to make sure software systems at this level are able to access high quality data. Not wholly dissimilar to the way a good chef fusses over the provenance of their poultry, fish and vegetables, AI data doesn’t need to be organic (synthetic data is actually really good eats here), but it does need to have a quality stamp to tell us where it has come from, what its potency is… and indeed what it’s shelf life might be.

While the current decade has been a cacophony of technology messages related to AI, the watershed moment when agentic AI services start to play a role in working enterprise organizations is just about upon us. Far more sophisticated than chatbots with scripted auto-responses, autonomous systems with an ability to understand intent are starting to edge us towards what could be called generalized functional automation.

AI Happened, Infrastructure Didn’t

Mike McKee, CEO of unified data trust platform company Ataccama thinks that we’re at an inflexion point right now. It’s a point where AI agents are surfacing inside workflows that are conversational, responsive and fast; but it’s also a point where the data infrastructure needed to underpin our new AI services looks a little flaky in places. It’s like AI happened, but a commensurate level of intelligent infrastructure didn’t. Or at least, not at the same level.

Given that McKee (a man who runs a data quality company powered by data lineage, governance and observability functions) would naturally want to advocate an amplified focus on AI data infrastructure, what points does he raise that offer deeper insights here? It’s true that only this year have we seen standards like model context protocol and Google’s open Agent2Agent protocol arrive; to suggest that the technology industry is working in overdrive to build new connectivity and data management backbone infrastructures for the AI age doesn’t perhaps seem too outrageous.

As we move on from standalone software bots to multiple AI agents that work together, the impact of poor data in the DNA of one part of a wider networked system starts to sound like more of an issue if it can be shared and reinterpreted elsewhere.

“It’s the point where we can mark a new architecture,” said McKee. “Not one large, centralized model, but many smaller, specialized agents working in coordination. It’s the same kind of shift we saw when monolithic mainframe and client-server applications gave way to cloud services, application programming interfaces and the wider birth of cloud-native platforms.”

Unified AI Demands Unified Data

Going back to the pile-it-high retailers, Walmart is recently reported to have implemented a number of domain-specific “super agents” for shoppers, employees, suppliers and developers. Each is focused, but together they form what has been called a “unified AI layer” that runs right across the width and breadth of the organization to move it towards enterprise-wide orchestration of AI agents.

With this level of autonomy, the Attacama CEO suggests that the question is no longer “can we build the agent?” It’s “can we trust it with our data and our decisions?” because when AI agents are exposed to data that is incomplete, outdated or noncompliant, even well-intentioned automation (not that AI normally gets an emotional worthiness grading) could lead to unwanted outcomes.

“This whole backdrop is why the emerging preeminent trend in this space is a pressing need for a smarter data trust layer; one that sits between agents and the data they consume,” asserted McKee. Before an agent runs off to analyze a dataset or recommend an action, it should be able to ask a) is this data accurate and up to date b) is it approved for this use and c) does this action comply with internal company policies and regulatory obligations? That trust check needs to happen in real time, not after the fact, not buried in a quarterly review – and it needs to come with explainability and confidence signals built in, so teams can act on agent output knowing the data is accurate and the reasoning is sound.”

This Is Not Governance, This Is Engineering

Discussing this subject in a closed press briefing this month, McKee suggests that the industry is now at a point where many enterprises are already treating data trust as an engineering priority, not as a governance task. What that means in practical software and data engineering terms is a set of processes and a platform approach dedicated to embedding quality, lineage policy and compliance signals into the absolute fabric of how agents access and use data.

“Getting this right isn’t easy. Orchestrating how agents interact, especially across business units and platforms, is a tough problem… and few companies have cracked it at scale,” explained McKee “What’s emerging is a new discipline: agent architecture. It’s about building smarter agents and structuring the environment in which they operate, securely, predictably and accountably. The use of MCP helps standardize how agents connect and exchange context. Teams can ask questions like: “What should our MCP servers do? How do we design orchestration that’s modular, compliant and scalable?” The lesson is that standards matter, as does structure. You end up with cleaner integration and coordination when agents speak a common language… and that’s what separates agentic chaos from agentic scale.

New Engineering Discipline: Agent Architecture

A practical example here could be a team member asking about sales performance and getting a response that explains results, recommends next steps and includes a confidence score, all backed by governed, policy-aligned data.

  • This is not: “I’ll get back to you in a few weeks with a report.”
  • Instead, this is: “Here’s what’s happening. Here’s why… and here’s what to do about it, with the audit trail to back it up to give you confidence in the decision you are about to make.”

“Early adopters are already seeing compression in planning cycles and operational workflows. When agents can tap into reliable data, the insight-to-action gap closes dramatically. When humans trust those insights, they’re more willing to delegate tasks and make faster, bolder decisions,” said Attacama’s McKee. “It’s still early. Most companies don’t yet have more than a few agents in production. Many are still experimenting on the edges of the business and that’s normal. But the momentum is real. Multi-agent systems are coming and the companies that benefit the most will be the ones that prepare now by standardizing how agents access data, enforcing policies in real time and building the trust infrastructure that lets AI move fast without breaking things. Autonomy and accountability don’t have to be at odds, but they do have to be architected together.”

The company’s most recent products launched to serve this space include the Ataccama ONE data trust platform v16.1. This version introduces data lineage and connectivity capabilities, including enhanced diagram export for audit and compliance use cases and improved lineage visualization tools. It also expands “pushdown processing” (when computations are moved closer to the data source) for cloud platforms, such as Azure Synapse and Google BigQuery.

Competitive Analysis: Augmented Data Quality Vendors

This discussion has gravitated around data provenance, data quality and data lineage, but although those terms are all widely accepted measures of worth in the technology industry, analyst house analysis at this level leans on the term “augmented data quality” vendors.

Ataccama holds a respectable enough upper-right quadrant position in this year’s market analysis of this space by Gartner, but Qlik is also a frontrunner. After its 2023 acquisition of Talend, Qlik has data quality tools that work alongside a broader set of data integration and data analytics services. Another key player, Informatica is known for its data profiling, cleansing and validation services that reside within the logically named Informatica Data Quality offering that boasts pre-built rules and accelerators.

Procurement decisions in the data quality space will now likely be governed by any given platform’s ability to exhibit cloud-native alignment, its ability to deliver machine learning and AI-first data quality controls (this stuff used to be a manual process), a technology’s ability to handle distributed complex data sets and… crucially, its ability to span data quality, data governance, master data management and (perhaps also) dovetail with business intelligence platforms.

There’s of course no such thing as a free lunch in data quality i.e. if you want additional automations and wider data validation, data ingest and data enrichment services (something that Experian is well known for in this space), then you do typically have to pay for it. Also on the roster here is IBM and SAS (big vendors, large established enterprise heritage, makes sense), then also DQLabs (strong on AI-powered self-service), Ab Initio, Irion and also Precisely (the result of a Syncsort and Pitney Bowes merger), a company with both data integrity and data quality solutions.

Beyond Garbage In, Garbage Out

By now, even the layperson understands that artificial intelligence data is a garbage-in-garbage-out conundrum. This is not just a conversation about high-quality data requirements for AI (although it is that too), that’s now ground zero tablestakes. We’re now moving on to realize that in hybrid multi-cloud environments, distributed data spreads across a whole variety of software systems and tools.

Feeding that variegated data stream into AI agents and making sure that the right agent gets the right fuel is (as already stated) a progression onward past compliance and governance (although it is also both those factors too), it is a stage where organizations can “operationalize automated lineage” into the agentic stream.



Forbes

Leave a Reply

Your email address will not be published. Required fields are marked *