Why Predictive Maintenance Pilots Stall
Most enterprise predictive maintenance pilots stall before payback. The fix isn't more sensors — it's the data foundation underneath. Here's the pattern.
Predictive maintenance is the AI use case manufacturers have been told to want for a decade. Vendors quote 95% adoption-success rates. Boards greenlight the budget. Vibration sensors get bolted on, a data scientist trains an anomaly model, and the kickoff deck promises a 30% reduction in unplanned downtime.
Then 18 months later, plant managers are still walking the floor with clipboards.
Industry data is brutal on this gap: roughly 60–70% of predictive maintenance initiatives fail to hit their ROI target inside 18 months, even when the underlying ML works. The most-cited reason isn’t model accuracy. It’s that the data the model needed never showed up — clean, governed, and joinable across vibration, thermal, ERP, and maintenance logs — at the same time.
We’ve seen the same failure pattern across half a dozen plants we’ve helped on big data in manufacturing engagements. The post below is what’s actually under that 60–70% number, and the data foundation that turns it around.
The four-signal problem
A predictive maintenance model that works in production needs four signal classes joined on the same asset and the same clock:
- High-frequency machine telemetry — vibration, current draw, acoustic, sometimes thermal — at 1 kHz to 50 kHz, streamed off PLCs through a SCADA layer or directly off edge gateways.
- Process variables — temperature, pressure, flow, RPM — typically already in a historian (PI, AVEVA, Aspen) at 1 Hz or slower.
- ERP and maintenance system records — work orders, planned-maintenance schedules, parts consumption, failure codes — from SAP PM, IBM Maximo, eMaint, or whatever the CMMS happens to be.
- Human-generated context — shift logs, technician notes, photos, supplier reports — usually trapped in PDFs, free text fields, or on a shared drive nobody has indexed.
Each of those four lives in a different system, with a different owner, a different cadence, and a different definition of “the asset.” The vibration team calls it MTR-A14-PUMP-3. SAP calls it equipment 10047831. The CMMS calls it Pump 3, Line A. None of them join without a master record nobody owns.
This is the silent failure mode. The model works on the lab dataset where someone hand-stitched the join. In production, the join breaks every shift change, and the model scores against partial inputs it was never trained on.
What “data foundation” actually means here
When we say a manufacturing client needs a data foundation before the PdM model is worth deploying, we mean four specific things — not “a data lake,” which is a noun, not a capability:
- An asset master that survives reorgs. One canonical asset ID per piece of equipment, with a deterministic mapping to every system’s local identifier — the SCADA tag, the SAP equipment number, the CMMS asset code. Maintained by a named owner, with a deprecation path when assets get retired or rebuilt.
- A time-aligned signal store. High-frequency telemetry, historian readings, and discrete events (work orders, alarms, shift changes) all queryable on a common timeline, with consistent timestamp normalization across plants and time zones. This is non-trivial — historian timestamps and ERP timestamps regularly disagree by minutes.
- A failure-mode taxonomy that the data actually conforms to. “Bearing failure” in the work order needs to map to the vibration signature for bearing failure, the thermal pattern for bearing failure, and the parts consumed when a bearing is replaced. Without this, supervised models have no labels and unsupervised models have no validation.
- Data contracts between OT and IT. When the SCADA team renames a tag because someone re-cabled a sensor, the model upstream needs to find out before the next score. Most plants we walk into have no contract here at all — the score quietly degrades for weeks until a maintenance manager notices the alerts went silent.
Without these four, no model is going to ship. With them, even a fairly basic anomaly model starts catching real failures inside the first quarter.
The architecture that works
The reference shape we deploy on plant-floor PdM engagements looks like this:
- Edge layer — gateways co-located with the PLCs, doing first-pass filtering, downsampling, and buffering when network drops. Raw 50 kHz vibration doesn’t go to the cloud; engineered features (RMS, kurtosis, spectral peaks) do.
- Plant historian stays. Don’t try to replace AVEVA or PI with a generic time-series database — too much existing tooling depends on it. Tap it.
- Cloud lakehouse — typically Microsoft Fabric or Databricks on Azure for the manufacturers we work with — as the joined long-term store. ISA-95 Level 3/4 boundary lives here: historian and CMMS data flow up, model scores flow back down to the SCADA HMI and to the Maximo work-order queue.
- Feature store that produces the same features for training and serving. This is the layer that breaks most often when teams skip it.
- Asset-master service — the canonical join key. Often sits beside MDM if the company already has one, or gets stood up as a lightweight service if not.
The sensor vendors are usually fine. The PLC programmers are fine. The data scientists are fine. The thing that breaks is the plumbing in between — and the org structure that doesn’t have a single owner for it.
Why most pilots get stuck at the proof-of-concept
Three patterns we see repeatedly:
- The pilot was scoped to one machine on one line. Eight weeks in, the model works. The client asks “great, how do we roll it to the other 600 machines?” — and the answer is “rebuild the data pipeline 600 times” because the foundation was never generalized. Most pilots die here.
- Maintenance data was treated as ground truth without being audited. A 2026 industry survey found 71% of maintenance teams are running PM programs that have never been validated against actual failure data. If the work orders are wrong, the supervised labels are wrong, and the model learns the wrong signal. We’ve audited CMMS exports where roughly 1 in 4 “bearing failures” were actually pump cavitation logged by the wrong technician.
- Nobody owns the productionization handoff. Data science demonstrates the model on a Jupyter notebook against a CSV. There’s no MLOps pipeline, no monitoring, no rollback path, and no operations runbook. The model gets emailed around as a slide deck.
The S.C.A.L.E. pattern for plant-floor PdM
The way we sequence this on a real engagement looks more like industrial integration than data science:
- Weeks 1–4 — asset master and signal mapping. Walk the plant. Build the canonical asset ID list with the maintenance team. Map every SCADA tag, every SAP equipment number, every CMMS code. This is unglamorous and load-bearing.
- Weeks 4–8 — historian tap and ERP/CMMS extraction. Stand up the joined long-term store. Validate that the same asset, queried across all four signal classes for a known historical failure, returns coherent data. If it doesn’t, the model definitely won’t.
- Weeks 8–12 — failure-mode taxonomy and label audit. Sit down with reliability engineers and reconcile what’s actually in the work orders against the engineering failure modes for each asset class. Throw out or relabel the bad records.
- Weeks 12–16 — train a baseline model on one asset class (rotating equipment is usually first). Ship it to a shadow environment. Measure precision/recall against the audited label set, not the raw CMMS.
- Week 16+ — gradually swap traffic. Wire the model output into Maximo as a work-order suggestion, not an autonomous action. Operators stay in the loop until the model has earned trust.
This is the S.C.A.L.E. data-foundation pattern applied to a plant: standardize the data layer first, then layer capabilities on top. The model is the last 20% of the work, not the first.
The economics, briefly
When the foundation is right, the same surveys that report 60–70% failure rates also show adopters who do address the data layer realize 40–55% maintenance cost reductions and 30–45% improvements in asset availability, with payback inside 12–24 months for most large-asset operations. The bimodal outcome distribution is the tell: PdM either works very well or it doesn’t work at all. There isn’t much middle ground, and which side you land on is almost entirely a function of the data work that happens before model training.
Where to start
If you have a stalled PdM pilot, the highest-leverage diagnostic is almost never re-tuning the model. It’s a four-week audit of the asset master, the historian-to-ERP join, and the maintenance-record label quality. That’s the work that decides whether the pilot scales to the next 600 machines or quietly dies.
If you’re scoping a new initiative, start with the data foundation pattern above, not the sensor RFP. The vendors will all sell you a great pilot. Only the foundation will get you past it. The generative AI in manufacturing playbook covers the agent layer that sits on top once the foundation is in place — but the order matters, and the order is foundation first.
Founder & CEO, Algoscale
Neeraj has led AI and data engagements for Fortune 500 clients across finance, healthcare, and retail. He writes about what actually ships — not what looks good in a slide.