Last time we discussed knowledge as it relates to the possession of experience (tacit knowledge) as well as possession of factual information (explicit knowledge) – and the important distinction between data, information and knowledge.
We saw that knowledge is about interpreted data and information put into a context and understanding of previous learnings or knowledge and we recognized the inherent difficulty in transferring knowledge from one person to the next – It’s crucial to understand these things when trying to capture, formalise and communicate expert knowledge in a form that makes it easy to consume for others.
Capture, store and share knowledge for troubleshooting
Capturing knowledge in a structured form is even more relevant and becomes even more evident when we look at troubleshooting. Using captured knowledge in a troubleshooting scenario requires that we must somehow be able to store all of our data and information in such a way that we can apply our interpretation and previous knowledge onto it. We need a model based approach for eliciting our knowledge into the computer.
Often technologies such as fault-trees, flow-charts, simple wiki based web sites and other document based approaches are used for collecting and sharing expert knowledge and experience. The problem however, is that we often lose our way in the myriad of information when trying to search for answers that can help us when trying to fix a problem as there are no dynamic elements that can help us navigate or utilize equipment state to increase precision. Another problem is that it really isn’t knowledge because there is no common interpretation of all this information and people just understand complex information in different ways. I might jump to one conclusion and you might another – so how can we apply the same interpretation an expert would within a certain problem area in a consistent way regardless of the skills of the user? How do we capture this important “know-how” and distribute it to other people for use in troubleshooting?
Consider this definition of troubleshooting as defined by the Cambridge Dictionary:
Troubleshooting: discovering why something does not work effectively and making suggestions about how to improve it.
Looking at the “why” in this definition leads us to consider that we must first understand that every problem has a cause. There might be many potential causes but typically there is only one that is the real reason for the problem – we might see many symptoms or additional causes as side effects of the root cause but our job is to find the real culprit and solve the problem at hand.
What is a cause?
A cause can be anything from “the device isn’t turned on” to “component X failed on electrical board Y”.
The depth of a causes will depend on who the intended audience is and what level of detail is needed.
For example, a leaking printer toner cartridge is a cause of a printer problem and your intended audience is consumers. Capturing the cause as a “leaking toner cartridge” would be sufficient. if instead the intended audience was technicians or engineers we could go a level deeper in detail and list the cause as “defective toner cartridge end seal” – the context and audience is very important. Similarly, it’s equally important to carefully consider the wording when formalising tacit knowledge – what might seem intuitive and obvious to one might not mean much to the next (even within the same target audience).
So, unless a single word cause is very descriptive and understandable, we should consider adding a few more words. For example: “Battery” might mean something to you in the instant that you capture that cause, but for the next person updating the system in a couple of months it would make more sense if you had written “Battery depleted” or “Battery defective”. Subtle differences that has a ton of impact regarding usage and maintenance of a knowledge base.
…
Looking again at the last part of the Cambridge definition: “making suggestions about how to improve it” we see that we off course need corrective actions that solves the causes of the problem and we need to structure all of this such that we can interpret the data and information we get the same way an expert would.
What is an action?
Actions are the solutions to the causes. An action is any step taken within the troubleshooting process that requires performing an actual task. This can range from physically moving a device to changing a software configuration, carried out step by step until we reach the one action that solves the root cause. Actions must be supported by easy to read explanations that instructs the user on how to perform the task, amended by images and videos to make things clear. The explanations is an important part as this is where we can communicate best practices for performing the task at hand. Take this action as an example: “Check that the Voltage and Current from the batteries is correct” – a very skilled technician would probably know what to do, but for many others it wouldn’t be obvious … there might be more than one battery in the machine, the expected voltage and current needs to be understood, maybe the measurements should be taken while the machine is running – maybe not. That’s where high quality content comes into play.
Experience has taught us that complex knowledge is much easier to digest for even the best engineers when served in small manageable pieces of content one step at a time compared to endless PDF documents and manuals. This actually holds true for both consumer troubleshooting and field service troubleshooting.
…
The most efficient approach
Structuring troubleshooting knowledge using causes and actions enables us to share knowledge with other people and make sure everybody troubleshoots at the same high level because we are mimicking the way humans reason – because this is how we actually think when we troubleshoot anything from personal computers to cars, trucks, wind turbines and everything else. We might not be aware as we try out the corrective actions that we usually do when we go step by step to try and solve the problem, but what the mind does is really to try and identify the root cause of the problem and solve the issue based on past experience and know-how, and this line of thinking is what we should try to mimic when creating software that can help others solve the same issues as efficiently as possible – identify the relevant causes and perform the corresponding corrective action.
Whenever we do troubleshooting in our head, we always consider what’s most likely (based on available information and data) and the effort we need to put into doing something about it. You can solve any computer problem by buying a new computer – but it’s simply too costly. You could also reboot your computer every time (and most likely you will), but if our data clearly tells us that it’s pointless (the screen is broken or the cable doesn’t work) it won’t help us at all. We need a way to organize our knowledge in causes and actions but we also need to consider cause probabilities and the cost or effort required to perform the corresponding actions in order to determine what’s the best and most efficient approach to take at any given time in the troubleshooting process.
Next time we will look into how we actually “make suggestions about how to improve it” by creating the most optimal sequence using probabilities, time, and cost and how observations of symptoms can dramatically affect the troubleshooting process.
About Dezide
We have 20 years of experience in helping businesses of all sizes capturing, organize, and optimize expert knowledge and we work with clients ranging from the world’s largest enterprises in the wind industry, mining sector, and air compressors to consumer printing and telecom.
Get in touch and see why they trust Dezide to build brilliant knowledge bases powering the world’s best service organizations.