Artificial intelligence: explain or not explain?
Prof. Dr. Wojciech Samek, Professor of Machine Learning and Communication at the Technical University of Berlin and Head of the Artificial Intelligence Department and Explainable AI Group at the Fraunhofer Heinrich Hertz Institute (HHI) and member of the Technological Enablers and Data Science working group in Plattform Lernende Systeme.
Modular systems designed by humans can be highly complex. Modern AI systems exceed this complexity many times over. However, the functions or qualities of individual neurons in the model remain largely unclear. As a result, results are often not comprehensible, which is problematic in many areas. How can explainable AI (XAI) methods and trends contribute to improving trust and quality?
Do we really need to understand AI in order to use and trust it? A common view is that we don't – after all, we take medicines whose exact mechanism of action is not yet fully understood. What matters, it is argued, are good evaluation methods that can be used to test the performance of AI. But this is precisely where the problem begins. For years, AI models have been evaluated solely on the basis of performance metrics. However, with the development of explainability methods, it has become apparent that models with good performance do not always ‘understand’ the tasks, but can cheat particularly effectively. For example, horse images are not recognised on the basis of the horse itself, but via a copyright watermark that frequently appears in horse images.
Explainability as a game changer
Explainability is therefore crucial for detecting errors in AI models at an early stage and ensuring that the model's decision-making processes are comprehensible and meaningful. This applies to both horse image classifiers and hallucinating language models. But explainability offers even more: for example, explainable models have been used to discover a whole new structural class of antibiotics. Explainability is also gaining importance from a legal perspective, for example through new regulations such as the EU AI Act, which requires transparency in certain AI applications. Germany is very well positioned in the field of explainability. Not only have many fundamental techniques been developed here, but some of the leading researchers are also based here. This knowledge and locational advantage should be leveraged to create more trustworthy and verifiable AI.
Three waves of explainability research
- Explanations of individual predictions
The first methods aimed to explain individual model decisions by visualising the influence of individual input dimensions (e.g. pixels) on the prediction. Various methods were developed to calculate these explanations. For example, the Layer-wise Relevance Propagation (LRP) method is based on the idea of distributing the prediction backwards through the network. Neurons that contributed more to the decision receive a proportionally larger share of the total relevance. The relevance values assigned to each pixel of the input image show which image areas were decisive for the AI's decision. - Understanding the model itself
The second wave of explainability research aimed to better understand the AI model itself. With the help of the activation maximisation method, for example, it is possible to show which features individual neurons encode. The Concept Relevance Propagation (CRP) method extends this type of explanation and allows the role and function of individual neurons in model decisions to be analysed. These methods of the second wave of XAI form the basis of emerging mechanistic interpretability, which analyses functional subnetworks (‘circuits’) in the model. - Holistic understanding
The aim of the latest methods in XAI research is to gain a systematic understanding of the model, its behaviour and its representations. Methods such as SemanticLens attempt to understand the function and quality of each individual component (neuron) in the model. This holistic understanding allows for systematic, automatable model testing, e.g. whether a skin cancer model really follows the medical ABCDE rule.
The future of explainability research
With the development of increasingly complex models, explainability will continue to gain importance, both as a tool for human-AI interaction and for the systematic analysis, testing and improvement of models. Large language models in particular offer an ideal basis for specifically investigating the role of individual components and actively controlling the model, for example to avoid hallucinations. The methods are thus evolving from pure explanation to targeted intervention options – a decisive step towards the safe and responsible use of modern AI systems.