Cookies on this site

We'd like to use a few non-essential cookies.

We use strictly-necessary cookies to make this site work. With your permission we'd also like to use Google Analytics 4 to understand which pages are useful, and Crisp Chat so you can start a conversation with us. See our Cookie Policy for detail — you can change your choice at any time from the footer.

Insights · Operations

PPM vs reactive maintenance: the true cost of running it to failure

The real cost of reactive maintenance rarely shows up on the maintenance line of a P&L — it hides in downtime, overtime, scrap, energy and shortened asset life. Here's how to compare fairly and decide where a planned programme actually earns its keep.

Published 9 min readBy Lutrom Engineering

Every operations director we speak to has an opinion on planned preventive maintenance. Some run rigorous programmes and audit them. Others have decided PPM is over-engineering and reactive is cheaper. Both camps can point to numbers. The difference is usually in which numbers they count.

This piece is about counting them fairly — and about being honest about where a planned programme actually earns its keep and where it doesn't.

Definitions, quickly

Reactive maintenance — sometimes called run-to-failure — means you fix the asset when it stops working. There is no scheduled intervention. This is the default for assets that don't warrant anything more sophisticated.

Planned preventive maintenance (PPM) — scheduled interventions at defined intervals, typically time-based (every 3 months, annually) or usage-based (every 1,000 hours, every 100,000 cycles). Tasks are pre-agreed: lubrication, filter changes, torque checks, test operation, cleaning.

Condition-based maintenance (CBM) — interventions triggered by evidence, not the calendar. Vibration surveys, thermal imaging, oil analysis, current signature analysis. Work is only done when the data says it's needed.

Most industrial sites benefit from a blend of all three. The mistake is picking one label for the whole site.

The costs reactive-only sites miss

A pure reactive strategy underprices the true cost of a failure. The invoice from the contractor is the visible part. The parts you rarely see on the same line item are:

  • Lost output. Even a two-hour line stop on a moderately profitable production line can dwarf the annual PPM cost for the assets on that line.
  • Overtime and expedited parts. Emergency call-outs, weekend rates and next-morning courier fees carry a substantial premium over planned work.
  • Scrap and WIP write-off. Product in the machine at the point of failure is often written off, especially in food, pharma and any process with temperature or dwell-time control.
  • Consequential damage. A single-point failure that takes down bearings, seals, drives, gearboxes and controllers. Reactive fixes tend to be broader — and pricier — than the underlying issue.
  • Energy penalty. Poorly maintained motors, drives and compressors run less efficiently. On a plant of any size, the energy penalty of a badly-tuned compressed air system or a contaminated chiller alone can exceed the cost of the PPM that would prevent it.
  • Insurance and audit exposure. Insurers, BRC/IFS/GMP auditors and health & safety inspectors expect evidence that safety-critical and process-critical assets are maintained on a defensible schedule. Purely reactive sites increasingly struggle at audit.

Where reactive still makes sense

Not every asset justifies PPM. A sensible criticality-based approach will always leave a category of items on run-to-failure — and that's correct. Typical candidates:

  • Off-the-shelf items with no consequential effect on production (e.g. task lighting, emergency light batteries handled on failure at the next planned visit)
  • Redundant assets — where an installed standby means a single failure has no operational impact
  • Assets scheduled for replacement in the near term where deep intervention isn't warranted

Structuring a defensible PPM programme

A PPM programme that survives contact with reality has three ingredients: a criticality assessment, a strategy per class, and honest review cycles.

  1. Criticality. Score every asset — or asset family — on production impact, safety risk, environmental risk and cost of failure. A simple 3×3 matrix (impact vs likelihood) is enough to separate the top tier from the rest.
  2. Strategy. Assign a maintenance approach per criticality band: predictive/condition-based for the most critical, time or usage-based PPM for the next tier, run-to-failure for the rest. Write it down. Refuse to move an asset up or down without a documented reason.
  3. Task list. For each planned asset, define the tasks — check, adjust, replace, test — with the interval and the acceptance criteria. Tasks that never produce findings should be challenged; tasks that repeatedly find issues should be shortened.
  4. Review. At least annually — and after every significant unplanned event — review the plan. If a bearing that was on 12-month PPM failed at 9 months, the interval was wrong. If a component has been changed three times without evidence of wear, the interval is too aggressive.

The metrics that actually move

You don't need a CMMS with 40 dashboards. Three numbers, tracked consistently, will tell you whether the programme is working:

  • MTBF on your top-tier assets — trending up over quarters is the single best sign that PPM is landing.
  • MTTR — planned work should slowly shorten the time it takes to recover from any incident, because engineers know the kit and the spares are on the shelf.
  • Planned : unplanned hours ratio — a healthy industrial site typically runs at 70-80% planned. If yours is inverted, most of the gains from PPM are still on the table.

Where to start

If you are moving from mostly-reactive to a structured programme, the highest-return first move is almost always a thermographic and vibration sweep of the plant. It generates a prioritised list of actual, evidenced problems — not a theoretical schedule. Fix those first, then use the same survey findings to build a defensible PPM plan around your genuinely critical assets. The programme pays for itself long before you finish rolling it out.

Common objections — and honest answers

When we make the case for PPM to clients still running mostly reactive, the same three objections come up. They're fair objections. They also have honest answers.

  • "We can't take assets offline for scheduled work." Most PPM tasks require far less downtime than the breakdowns they prevent. Where a shutdown genuinely isn't possible, condition monitoring — thermal imaging, vibration analysis, ultrasound leak detection — is done live and produces the same early-warning benefit without stopping production.
  • "Our engineers know the plant — they'll spot problems." They will spot some. Site engineers under time pressure spot the things they walk past every day. A structured survey deliberately looks at what nobody normally does — the top of switchgear, the back of drives, the loft above the boiler house — and consistently finds issues local knowledge misses.
  • "It sounds expensive." Almost every PPM programme we help clients set up pays for itself within the first twelve months from avoided breakdown costs alone — before you count the longer asset life, the lower energy consumption or the easier audits.

The point of a good maintenance strategy is not to eliminate breakdowns — that is not achievable. It is to make sure the breakdowns that do happen are the ones you chose to accept, not the ones that took you by surprise.

Frequently asked

QUESTIONS WE OFTEN GET.

Isn't reactive maintenance always cheaper?

On paper, reactive spend looks lower because you only pay when something breaks. In practice, the cost of the breakdown itself — lost production, expedited parts, overtime, potentially scrapped WIP and knock-on failures — usually dwarfs the price of the planned intervention that would have prevented it. A fair comparison has to include total cost of failure, not just the invoice from the contractor.

Where does PPM stop paying off?

Low-consequence, easily-replaced assets rarely justify heavy PPM — think emergency lighting battery packs that are simply replaced on failure, or standardised commodity motors on non-critical duty. The rule of thumb is: if a failure interrupts production, damages other assets or triggers a safety event, plan it in; if the failure is contained and the fix is trivial, run it to failure.

How do I structure a PPM programme without over-engineering it?

Start with a criticality assessment — rank assets by production impact, safety risk and cost of failure. Assign a maintenance strategy per asset: run-to-failure, time-based, condition-based or predictive. Only the top two tiers usually warrant scheduled visits; the rest are handled by regular thermographic and vibration sweeps that catch issues before they escalate.

What metrics tell me if my strategy is working?

The three most useful are MTBF (mean time between failures) for critical assets — trending up is good; MTTR (mean time to repair) — trending down is good; and the ratio of planned to unplanned engineering hours — a healthy industrial site typically runs at around 70-80% planned. If your ratio is inverted, PPM is likely to pay back quickly.

Talk to us

Want us to look at your site?

Talk to a working engineer — not a call centre. Every enquiry gets a same-day human reply.

Related

Read next & work with us

More insights

Made with Emergent