Introduction
If a near miss occurs on your plant, the first question to ask is: what could have happened if it hadn’t been a near miss, but instead a catastrophic event like a plant explosion?
Investigating incidents and near misses is a fundamental part of the process safety framework and can be found in two core pieces of process safety documentation – Centre for Chemical Process Safety Guidelines (USA) and the Energy Institute (UK).
These process safety management frameworks emphasise learning as a core pillar and focus on how you and your organisation should proactively manage and drive improvement.
The goal of any investigation is to prevent a specific event from happening again.
We often hear about the importance of “learning lessons,” but as humans, we are prone to forgetting or getting distracted. That’s why modern incident investigations must focus on uncovering the root cause of an issue.
Only through identifying and addressing root causes can we achieve meaningful improvement and ensure that similar events are prevented. Without this analysis, the same vulnerabilities are likely to persist, leaving your organisation at risk of repeating the same mistakes.
Why Do We Investigate Events and What Are We Investigating?
When there is an unintended release of energy or toxic material, a flammable or combustible dust fire or explosion incident, the aim of the investigation is to understand the failure and address the weaknesses in the management system to prevent similar incidents from recurring.
The safety triangle illustrates how unsafe conditions can accumulate.

Initially, these unsafe practices might not result in any incidents, but they can build up over time, leading to near hits and incidents. You must be able to pinpoint your missing safeguards to improve your systems moving forward.
Controlling Hazards
Controlling hazards will always involve multiple layers of protection, each with varying degrees of effectiveness. Although equipment design is the foundation (and should be the strongest layer of defence), it cannot eliminate all hazards.
To address residual risk, additional layers of independent protection must be implemented.
Engineering controls, such as automated systems that maintain specific pressure and temperature levels may act as an independent layer.
However, these controls are not foolproof, which is why administrative controls, while weaker, are also essential. These measures include the use of personal protective equipment (PPE) and personnel training.
An incident often results from the failure of one or more protective measures—or from the absence of a control that was never considered.
The goal of an effective safety system is to include as many opportunities as possible to break a chain of events. By removing a single link in the sequence of events, an incident can be prevented altogether.
An event represents a failure of one or more threads or a thread that’s missing that we never even thought to put in.
Incident Analysis Framework
Every incident investigation technique relies on a structured framework to understand what happened and determine how to prevent it from happening again. A typical framework will recognise that incidents and near misses are, as mentioned above, a chain of events.
These events often stem from deficiencies in a process safety management system rather than from an individual. The root cause is always a management system failure and the causal factors might lie in poor initial design, the failure of an engineering control, or an over-reliance on administrative controls.
It’s important to recognize that an event rarely results from a single failure. Instead, multiple elements within the management system have broken down.
Do not to stop the investigation after identifying just one issue.
You must dig deeper to uncover additional failures that may have played a role. Only by addressing all contributing factors can the risk of recurrence be effectively minimized.
How Do We Identify Potential Causes?
All events result from a failure in your management system. The key is to identify which specific components of the system failed. This is a recurring theme throughout this blog and the incident investigation process: pinpointing the exact elements of the management system that broke down.
If your investigation hasn’t uncovered something that the company or management can change, then you haven’t reached the root cause yet.
ECF Approach
The Events and Causal Factors (ECF) approach begins with constructing a timeline—a sequential account of the events leading up to the incident or near miss. This helps you to understand the chain of events and ask questions about why the incident happened.
For example, consider a scenario where a pressure safety valve activates to relieve pressure to a safe location. Why did the column pressure rise to a dangerous level in the first place? Was the Standard Operating Procedure (SOP) inadequate, or was the operator not properly trained on it? Did the operator resort to unauthorized shortcuts because the system made the task unnecessarily difficult?
The focus of the investigation should not put blame on an individual. Instead, you aim to delve deeper into the causal factors to uncover the root cause—the management system failure that contributed to the incident. Identifying this failure enables you to make meaningful recommendations to prevent similar issues in the future.
Your process should document the sequence of events step by step, from what initially occurred to what happened next, and so on, leading up to the incident or near miss.
Casual Factors
When analysing conditions, we should always ask two key questions: Can we manage or change these conditions? Would removing the condition prevent the event from occurring or reduce its severity? If the answer to both questions is no, then we have not identified a causal factor. If the answer is yes, we label the condition and investigate further to trace it back to the root cause.
Causal factors can stem from several areas. As a result, you should examine the conditions and failure modes, the design, and the state of the equipment before, during, and after an event.
Consider factors like temperature, pressure, flow, and the status of instruments. Were alarms functioning properly? Did any interlocks fail to activate? These elements represent the strongest layers of independent protection and must be scrutinized.
Another area to investigate is people. Were appropriate standard operating procedures in place? What actions were the individuals involved taking, and what were they thinking at the time? Were they adequately trained?
You should also consider their position relative to the equipment—if someone was injured, were they in a hazardous area or “line of fire”? Was supervision adequate? Were there communication breakdowns about the tasks that needed to be completed? And finally, you should evaluate their fitness for duty—what was their state of mind?
Environmental factors can also serve as causal contributors. Was the area well-lit? Was excessive noise present? Was poor housekeeping a factor?
Your investigation doesn’t stop after identifying causal factors, you must then delve deeper to uncover the root cause, which is always a failure in the management system.
How Do We Get to the Root Cause?
To identify the root cause, we use a systematic approach, such as the “five why” method. This involves repeatedly asking why the causal factor occurred. Specifically, we ask why the relevant component of the management system failed at the time of the incident. It’s important to remember that there are often multiple root causes, so the investigation should not stop after identifying just one.
A root cause is a specific breakdown within a broader category of basic causes. A common example includes missing elements, such as inadequate training, or Management of Change systems that have not been properly implemented or followed.
Root causes can also involve flaws, such as incorrect equipment design, improper construction, or insufficient procedures. Sometimes, a hazard may not have been adequately identified in a Process Hazard Analysis (PHA).
These root causes can often be further subdivided and classified, providing deeper insights into the problem.
Conclusion
An effective incident investigation is important to prevent the same event from occurring again.
All events can be traced back to a failure within a management system. To uncover this failure, you should begin by constructing a timeline of events to identify the causal factors along the way. From there, the use of a systematic framework, such as the five-why method, will help you to delve deeper into the causal factors until the root cause within a management system is discovered.
Once the root cause is identified, you can then develop recommendations to address and correct the management system failure.
At Sigma-HSE, we take the time to understand your business and processing needs. We are experts in applying strategic incident investigation practices for enterprises of any size.
Recording and implementing the findings of an effective incident investigation can be difficult to manage but, by partnering with Sigma-HSE, you can safely drive your process safety, enhance product quality and grow your competitive edge.