Incident management is one of the 17 service management practices of ITIL 4 that you need to know in depth. You need to be able to recall its definition, purpose, terms associated with it and how it works. The purpose of incident management is to minimize the negative impact of incidents by restoring normal service operation as quickly as possible. Simply put, it is what are you going to do when something goes wrong.
But what is an incident? An incident is an unplanned interruption to a service or reduction in the quality of a service. Every incident needs to be logged, managed to the agreed-upon target resolution times, and prioritized. Design your incident management practice appropriately by thinking about it from different perspectives with different incidents. You want to take your incidents and categorize them based on the different impacts they have. Is it a major incident or a minor incident? Is it an information security incident or is it a management and operations incident? All of these are things that you have to think about as you go through your incident management practice.
When it comes to prioritizing your incidents, do it based on an agreed classification and everyone in the organization should know what that classification is. Whatever classification it is, you need to know it, your customers need to know it, and everyone who supports the systems needs to know it. You should also have a way to say what would categorize an incident as high or low. Is it based on dollar time? Is it based on the person's position in the company? Is it going to be based on how much the thing costs? Is it going to be based on how long you estimate it's going to be down? All of those are valid ways to classify the incident. It really depends on how your organization decides to do it. You also need to ensure that your incidents that have the highest business impact are resolved first. That means you would more than likely want to prioritize fixing something that affecting 10,000 users compared to one affecting 5 users.
In incident management, you want to use a tool to log and manage your incidents. This is a way that you can end up linking them to configuration items, different changes, different problems, known errors, and other things. By having a good system, it makes the knowledge management piece a lot easier, and helps you resolve incidents much faster. Another thing these can do for you is they can provide incident matching to other incidents, other problems, and other known errors.
Incidents can also be escalated for higher levels of support. Oftentimes, your service desk, the first person who's taking in the incident, isn't going to be the person who's going to resolve it. Instead, if they can't solve it at their level, they'll raise it to tier two support, and if tier two can't do it, it'll go to tier three support, and you'll keep going up until you get to the right level of support to get that resolved. This routing is typically going to be based on the incident's category. Anyone working on an incident should provide quality and timely updates back into this incident management system because your service desk analyst is the frontline. They're going to be the ones talking to the customer, and when the customer asks if their ticket has been resolved, they need to know about it. If you've been working it and you haven't been logging it in the incident management system, the analyst doesn't know the updates, and the customer doesn't get the communication.
Incident management requires a high level of collaboration within and between teams. For example, if you're solving the issue, you need to tell that to the service desk and also need to work between and across other teams. When working to diagnose and resolve an incident, your user might end up using a self-help option. They might use calling the service desk as an option. The service desk might contact your support team, your suppliers, or even your partners. All of this is collaboration and communication. Oftentimes, incidents can be very major. If you have a really major incident, one of the things you might want to do is create a temporary cross-functional team to address that incident. In addition, you may be doing an incident response based on a disaster recovery operation, and if this is the case, you want to bring in the disaster recovery team as well, because they know the plans and policies of what you're going to do to overcome that disaster.
When thinking about incidents, it heavily relies on collaboration to facilitate information sharing and learning, as well as helping to solve the incident more efficiently and more effectively. Thinking about this through the value chain activity lens, which ones are going to be affected? In improve, you're going to be thinking about the incident records in your incident management system. These are a key input to your improve activities. Go in there and find the frequency and severity of different incidents to figure out where you need to put more resources, time, or money to improve.
From the engage activity, think about the fact that incidents are visible to everyone. Your users and customer are going to know about it so engage with them and let them know ahead of time how you're solving these problems. From design and transition, the incident might occur in a test environment, which happens during design. It might also happen during the release and deployment, which is part of transition.
Either way, make sure the incident management practice is there to help ensure that the incidents are resolved in a timely and controlled manner. From an obtain/build perspective, think about the fact that incidents occur in development environments too. As you're obtaining and building things, things break. The incident management practice has things in place to help with those as well, because you want to make sure you're figuring those things out early before moving it into transition, and then from transition over to deliver and support. Deliver and support is where you'll spend a lot of time in incident management, because this is a significant contributor to supporting your users. Deliver and support's value chain activity includes the identification and resolving of incidents and problems and communicating that back to the user. And that goes back into engage again. It all comes down to how you identify, diagnose, and solve all these problems as quickly and efficiently as possible.