Algoscale

Manufacturing Data Foundation: A Blueprint

Plant floor, ERP, quality, and maintenance data live in four silos that never join. Here's the manufacturing data foundation architecture that fixes it.

Neeraj Agarwal

Neeraj Agarwal

Founder & CEO, Algoscale

A plant director is in a Monday operations review. The CFO asks a single question: what was our overall equipment effectiveness across the network last month? Three answers come back. The MES team reports 71%. The corporate finance team reports 64%, derived from ERP downtime codes. The reliability team reports 78%, derived from the historian and the maintenance schedule. None of them are lying. They are reading from three different systems that were never designed to agree.

This is the everyday manufacturing data problem. Not “we don’t have enough data.” Not “we need more sensors.” The data is already there — it is sitting in four operational silos that don’t share an identity model, a clock, or a definition of what counts as a downtime event. Every Industry 4.0, smart-factory, predictive-maintenance, or generative-AI-on-the-plant-floor initiative quietly bottoms out on the same problem: the layer underneath isn’t ready.

This post is the data foundation pattern we deploy when a manufacturer needs that layer to actually exist. It’s drawn from big data in manufacturing engagements across discrete and process manufacturers, and it’s the prerequisite to the AI capability layer most plants are now trying to deploy.

Why manufacturing data is uniquely hard

Three things make manufacturing data harder than the rest of the enterprise data landscape, and they show up at every plant:

The ISA-95 boundary is a real boundary, not a diagram. The Purdue reference model puts SCADA, PLCs, and historians on one side (operational technology) and ERP, planning, and analytics on the other (information technology). The reason the model survived four decades is that the boundary is enforced by physical security posture, network segmentation, and two engineering cultures that have rarely been on the same team. Crossing it is not a connector problem; it is an organizational and security one.

Timestamps don’t agree. A PLC’s timestamp comes from its onboard clock or a plant-local NTP server. A historian’s timestamp may have been re-stamped on ingest. An ERP work order is timestamped when a human pressed save. A CMMS event is timestamped on close-out, which can be hours after the actual event. Joining four systems on a clock requires explicit timestamp normalization, and most data lakes silently treat the timestamps as equivalent and produce wrong analytics for years.

The asset taxonomy is multilingual. SCADA calls it MTR-A14-PUMP-3. The historian calls it \\PI\Site1.Pump3.RPM. SAP calls it equipment 10047831. The CMMS calls it Pump 3, Line A. The MES calls it Asset_LineA_03. None of these auto-join. Without a canonical asset master, every cross-system query is a fresh manual reconciliation.

Compound these three and you understand why the typical manufacturing data warehouse project ships a year late, costs three times the budget, and gets quietly retired when the next platform vendor walks in.

The four data domains

A working manufacturing data foundation joins four operational data domains on a single asset, a single clock, and a shared semantic model. Each domain owns something specific and disclaims everything else.

DomainWhat it ownsWhat it does not ownTypical cadence
Plant floor (OT) — PLCs, SCADA, historian (PI, AVEVA, Aspen, Canary)Real-time process state, sensor telemetry, alarms, machine statusThe business reason for a state change; the planned schedule; the work order1 Hz to 50 kHz; sub-second on events
ERP (SAP, NetSuite, Oracle, Infor)Production orders, BOMs, costs, planned schedules, inventory commitmentsReal-world execution state; quality outcomes; equipment conditionTransactional; sub-second on writes, batch on settlement
Quality — LIMS, QMS, electronic batch records, SPCIn-process and finished-goods test results, non-conformance reports, batch genealogy, calibration recordsWhere the deviation happened in real-world process time; the machine condition during the deviationPer-batch or per-test; minutes-to-hours latency
Maintenance (CMMS) — Maximo, SAP PM, eMaint, FiixWork orders, planned maintenance schedules, parts consumed, failure modes, asset hierarchyThe real-time process signature of a failure; the production cost of the downtimeEvent-driven; latency from minutes to days depending on close-out discipline

Two implications follow. First, no single domain has the answer to any interesting business question. OEE requires plant floor for run state, ERP for planned production, quality for first-pass yield, and maintenance for unplanned-downtime classification. Root-cause analysis on a quality excursion requires the historian trace at the moment of the deviation, the batch record, the work-order history of the equipment, and the operator and shift assignment from ERP. Genealogy and traceability requires all four. The interesting questions live at the join, not inside any one system.

Second, the source-of-truth assignment per attribute is not negotiable. Run state is the historian. Planned production is ERP. First-pass yield is the QMS. Failure code is the CMMS. When two systems disagree — and they will — the foundation needs a written rule for which one wins, per attribute. Without that rule, every monthly review is a debate about which screen to trust.

The four integration problems

Joining the four domains breaks down into four engineering problems that have to be solved deliberately. Skip any one and the foundation falls over.

Identity. A canonical asset master that resolves every system’s local identifier into one stable ID. Lives outside any one source system, has a named owner, has a deprecation path when assets are retired or rebuilt. The same discipline that shows up in master data management applies here — the data integration work is essentially equipment-MDM. The asset master also resolves the location hierarchy — site, line, cell, asset — so reporting can roll up consistently across plants that were built by three different engineering firms in three different decades.

Time. Every signal arrives at the lake stamped with the source system’s clock, the source system’s idea of the timestamp’s meaning, and the source system’s tolerance for clock drift. The foundation needs an explicit time-normalization pass on ingest: convert to UTC, record the source timestamp separately, record the ingestion timestamp separately, and record the confidence interval on the timestamp itself. Historian timestamps within ±50 ms are typically trustworthy; ERP timestamps within ±15 minutes are typically the best you can hope for. Mixing them in a query without acknowledging that asymmetry is how you get analytics that drift.

Semantics. A run state of “DOWN” from the MES, a status code of “08” from the PLC, and a maintenance reason of “Mechanical Failure - Bearing” from the CMMS all describe the same event from three perspectives. Reconciling them requires a shared event taxonomy maintained by reliability engineers and the data team together. The most common failure mode here is treating taxonomy reconciliation as a one-time exercise. It is a living artifact — when a new asset class arrives or a reliability program changes its failure-mode codes, the taxonomy moves with it.

Latency. A foundation that serves operations needs sub-five-minute paths for decision-tier consumers (an operator deciding whether to call maintenance) and tolerant batch paths for analytical-tier consumers (the OEE dashboard refreshed nightly). The architectural consequence is two read paths off the same lake — a hot path through a bus to an operational store, and a batch path through the lake to a warehouse or lakehouse for analytics. Trying to serve both from one storage layer is a common mistake that produces 15-minute exception alerts on a control-room dashboard and the floor stops trusting it.

Two architectural paths — Unified Namespace and the data lake — and where each belongs

There’s a real architectural debate happening in industrial data right now, and a manufacturer scoping a foundation will hear both sides loudly. They are not mutually exclusive. They live at different ISA-95 layers and they solve different problems.

The Unified Namespace (UNS) approach comes from the OT side. An MQTT broker sits at the heart of the plant, every device publishes its current state into a hierarchical topic tree (enterprise/site/area/line/asset/state), and every consumer — HMI, MES, edge analytics, even a cloud bridge — subscribes to the topics it cares about. State is the source of truth; events are derived. The UNS solves the integration sprawl on the plant floor: instead of 47 point-to-point integrations between historians, MES, and SCADA systems, every system speaks to the broker. Sparkplug B is the framing most production deployments use.

The cloud lakehouse approach comes from the IT side. A central object store (S3, ADLS, GCS) holds the raw and curated data in an open table format (Iceberg, Delta, Hudi), connectors pull from each source system on appropriate cadences, and analytical and ML workloads run on top. Microsoft Fabric and Databricks are the two stacks most large manufacturers we work with land on, with the historian tapped via OPC UA or vendor-specific connector. The lakehouse solves the analytical and ML problem: long-term history, complex joins, training datasets for predictive models.

The honest answer is that a serious manufacturing data foundation needs both, with a clean boundary between them.

  • The UNS lives at ISA-95 levels 2 and 3 — on the plant network, real-time, sub-second, the system of record for current state. It is what an operator’s HMI reads. It is what an edge anomaly detector subscribes to.
  • The lakehouse lives at level 4 and above — in the cloud or a corporate data center, optimized for historical analysis, ML training, and cross-plant rollups. The UNS bridges into it through a one-way push that lands canonical events in the lake on a defined cadence.

The mistake most projects make is treating these as competing answers. Pick one and you ship a foundation that solves half the problem. Combine them with a clean boundary and you get an architecture that the OT team trusts on the floor and the IT team can build analytics and AI on top of.

The S.C.A.L.E. pattern for manufacturing

The reference shape we deploy on plant data-foundation engagements has five layers, mapped to the S.C.A.L.E. data foundation we anchor every heavy-vertical project on.

  • Connect. Historian tap (PI Adapter, AVEVA Hub, or OPC UA direct), ERP CDC (SAP via SLT or Daffodil, NetSuite via SuiteQL, Oracle via GoldenGate), QMS / LIMS connectors, CMMS extracts. On the OT side, a UNS broker (HiveMQ or equivalent) collects current state from PLCs and SCADA. Each connector normalizes into the canonical event and asset schemas on ingress — translation happens at the edge of the foundation, not at every consumer.
  • Centralize. A unified event bus (Kafka, Event Hubs, or Kinesis depending on the cloud) plus a temporal lakehouse (Iceberg or Delta on object storage). The lake is the long-term system of record; the bus is the live transport. The bus also carries the UNS bridge events so cloud consumers can read plant state in near-real-time.
  • Conform. Asset master, location hierarchy, event taxonomy, and the time-normalization service. This is the layer that makes everything else possible. The bulk of the engineering effort goes here and the bulk of the business value compounds from getting it right. This is also where the bidirectional resolution tables live — every source-system ID is mapped to the canonical ID, with version history so a retired equipment number doesn’t break a historical query.
  • Consume. Two read paths off the lake — an operator-facing low-latency hot store (Postgres or DynamoDB or a serving layer fed by the bus) and the analytical lakehouse for OEE dashboards, quality investigations, ML training, and corporate finance reporting. Power BI or Tableau sit on the lakehouse for executive reporting; thin operator HMIs read the hot store.
  • Govern. Access boundaries (a plant sees its own data; corporate sees the network; regulators see their slice), audit logging on every read, and regulated-mode compliance posture where it applies — FDA 21 CFR Part 11 for life sciences manufacturers, IATF 16949 traceability for automotive, EU Machinery Regulation. The data engineering work to enforce these access boundaries cleanly across the lakehouse and the hot store is where most “we’ll fix it later” architectures eventually break.

Cloud choice tracks the corporate footprint. Azure-heavy manufacturers (typical for the discrete world that already runs on Dynamics or SAP-on-Azure) land on Fabric or Databricks-on-Azure with Event Hubs and ADLS. AWS-heavy manufacturers land on MSK + S3 + Iceberg + Athena/EMR. GCP appearances are rare in industrial but happen at the more tech-forward CPG and chemicals operators. The pattern is the same; the parts catalog differs.

What a working foundation unlocks

The foundation is not the deliverable. The capability layer it enables is. With the four domains joined, the asset master clean, and the event taxonomy honest, a manufacturer can finally ship:

  • A single, defensible OEE number that reconciles across MES, ERP, and historian — same answer in the plant manager’s daily standup and the CFO’s quarterly review.
  • Quality root-cause analysis that links a non-conformance back to the historian trace at the moment of deviation, the operator and shift, the lot of incoming material, and the maintenance history of the line. What used to take a week of cross-system spreadsheet work becomes a query.
  • Predictive maintenance that actually scales — the four-signal problem is solved upstream once, instead of being solved 600 times per machine. See our deeper write-up on why predictive maintenance pilots stall for the use-case specifics.
  • Energy and sustainability reporting that joins consumption data from the historian to production output from MES — Scope 1 and 2 emissions allocated to product, line, and shift rather than guessed at site level.
  • Genealogy and traceability for recall management, regulatory reporting, and supplier-quality investigations.

Each of these is a year of work on its own without the foundation. With the foundation, each is a quarter.

Where AI agents fit on top

Once the foundation is real, the next layer — AI agents working alongside operators and reliability engineers — becomes tractable. We will go deep on the agent layer in an upcoming post; the short version is that an agent doing anomaly triage on a production line needs a stable asset identity (so its actions don’t double-fire on the same machine under two different IDs), a temporal event store (so it can reason about what was known when), and reliable resolution to the work-order system (so its escalations get routed to a real maintenance technician with a real parts-availability check).

This is the same pattern we deploy on the generative AI in manufacturing side — agents absorbing routine triage, root-cause first-passes, and shift-handover synthesis. The agent is the last 20% of the work and the most visible 80% of the value. The foundation is the unglamorous prerequisite.

Where to start — a 30-day foundation audit

If you’re scoping this work, the highest-leverage first step is not a vendor RFP or a platform selection. It’s a four-week audit that establishes what you already have, what’s missing, and what the sequenced build actually looks like for your plants.

  • Week 1 — Domain inventory. Catalog every plant-floor system (historian, SCADA, MES, OPC UA gateway), every ERP and CMMS instance, every QMS and LIMS. Count the hidden ones — spreadsheets a reliability engineer maintains, Access databases on a shared drive, the part-number aliases someone keeps in OneNote.
  • Week 2 — Identity audit. Pick 50 assets across plants. Manually walk each one through every system that references it. How many systems agree on what equipment it is? How often does the chain break? Same exercise for materials, work centers, and shifts.
  • Week 3 — Time and event audit. For a known historical event (a production stop, a quality excursion), pull the corresponding records from every system. Compare timestamps. Compare the event taxonomies. How wide does the time window have to be before all four agree something happened?
  • Week 4 — Use-case sequencing. Inventory the AI, analytics, and reporting initiatives currently scoped or stuck. For each, map which of the four domains it needs and which of the four integration problems are blocking it. The pattern that emerges is your priority list.

The output of those four weeks is a sequenced foundation build with the connectors, the identity model, the UNS-to-lake boundary, and the consumption path that match your operation. The build itself runs 6 to 12 months for a multi-plant network. The use cases that were stalled — PdM, OEE, quality analytics, the agent layer — start shipping inside the second half of that window, not after it.

The Monday operations review with three different OEE numbers doesn’t go away because you bought a platform. It goes away because the four domains finally agree on what they’re measuring, the asset master is owned by a named person, and the foundation underneath is the one source the executive team learned to trust. That is what a manufacturing data foundation actually looks like — and it lives in the integration layer below the dashboard, not in the dashboard itself.

Neeraj Agarwal

Neeraj Agarwal

Founder & CEO, Algoscale

Neeraj has led AI and data engagements for Fortune 500 clients across finance, healthcare, and retail. He writes about what actually ships — not what looks good in a slide.

Related reading

More on this topic

Pick your starting point

Two quick diagnostics for the two questions we get most

No sales calls required to get real answers. Both tools return dedicated output in under 5 minutes.