A team with strong reliability-centered maintenance KPIs is better setup to succeed at root cause analysis.

Root cause analysis (RCA) is a systematic approach to identifying and understanding the underlying problem that led to an undesirable result. The goal of the process is to pinpoint how and why a given failure happened so you can eliminate the root cause and prevent the same problem from recurring. 

When carried out correctly, root cause analysis can help a maintenance team shift to a preventive rather than corrective or emergency maintenance strategy, ultimately decreasing unplanned downtime, saving time, and reducing costs.  

The 5 Basic Steps of Root Cause Analysis

When your organization recognizes that a specific problem continues to crop up (such as the same belt repeatedly snapping on a piece of machinery), it is time to carry out root cause analysis.

At a fundamental level, root cause analysis can be broken down into 5 basic steps: (1) define the problem; (2) gather data; (3) identify contributing factors; (4) identify the root cause; and (5) implement and monitor changes. 

1. Define the Problem

The first step in root cause analysis is to define the problem you wish to solve in a clear and descriptive manner. Include relevant data that supports why the issue needs to be addressed, like the amount of downtime or associated costs. Be specific at this stage, as it will help your team stay focused on the task at hand, saving time during all following steps.

2. Gather Data

After defining the problem, it is time to collect as much data as reasonably possible. This data may include asset ages, operating times, maintenance histories, environmental conditions, organizational impacts, and anything else that may be relevant to the defined problem. An ideal way to collect, organize, and eventually review comprehensive and consistent data is to use a cloud-based Computerized Maintenance Management System (CMMS) like eMaint.

3. Identify Contributing Factors

Once sufficient data is collected, it’s time to organize and analyze the data to identify all potential factors that could have contributed to the defined problem. Pay special attention to any anomalous data captured during the sequence of events prior to the problem occurring. This is made easier by utilizing CMMS software integrated with data-collection hardware like the remote sensors offered by Fluke Reliability. 

4. Identify the Root Cause

After identifying and mapping causal factors that led to the previously defined problem, you should be at the point where you can identify the true root cause of the problem. The more specifically you can pin down the root cause, the easier and more effective your solutions will be. Remember, a root cause needs to be something addressable, whether it is a physical issue (materials failure), human issue (errors or oversights), or an organizational issue (defective process or policy). 

5. Implement and Monitor Changes

With the root cause identified, now it’s time to take action. The goal here is not to just patch up the problem so you can continue with the status quo, but to implement a permanent solution that will prevent the problem from happening again in the future. And to make sure the problem is really corrected, it’s also important to continue capturing and analyzing data after your solution is implemented – a task again made much easier with the use of a CMMS like eMaint paired with remote asset-monitoring sensors like those from Fluke Reliability.

Root Cause Analysis Best Practices

Because root cause analysis is a methodical, potentially time-consuming process, it’s important to ensure that a problem is significant enough that dedicating the time and resources needed to correct it will pay off in the long run. If a failure is fast and inexpensive to fix, or happens very infrequently, then root cause analysis may not be necessary. 

Although it depends on your organization and industry, it is typically best to only carry out root cause analysis when a problem: 

  • results in significant costs;
  • impacts mission-critical assets;
  • occurs on regularly basis;
  • endangers employee health or safety;
  • or otherwise impedes your organization’s goals.

There are a range of different problem-solving techniques you can use to help you gather data, identify contributing factors, and zero in on the root cause of a problem. These include carrying out fault tree analysis, fishbone diagram analysis, failure mode and effects analysis (FMEA), or even performing the popular “5 Whys” technique. 

The basic premise of all these methods is that you start with the final undesired outcome, and then ask “Why did it happen?” List out the possible causes, and then continue drilling down on what could have caused each step of the failure until you eventually identify the true crux of the issue. 

No matter how exactly you tailor your process, the key to successfully performing root cause analysis is making sure you have all the data you need to fully understand and address the issue. Fortunately, comprehensive data on machine diagnostics, work order records, preventive maintenance (PM) schedules, and more are all at your fingertips when utilizing the right CMMS system.