Explainable AI Cheat Sheet

Your high-level guide to the set of tools and methods that helps humans understand AI/ML models and their predictions.

Cheat sheet



A brief overview of the Explainable AI cheat sheet with examples.


If you're interested to learn more, this is a non-exhaustive list of links and resources. We'll continue to add more resources. Feel free to suggest valuable resources on the Issues page.


Interpretable Machine Learning (IML) - Christoph Molnar
Explainability for NLP - Isabelle Augenstein [video]
NLP Highlights: Interpreting NLP Model Predictions - Sameer Singh [audio]
Please Stop Doing "Explainable" ML - Cynthia Rudin [video]

Interpretable Models

StatQuest: K-nearest neighbors, Clearly Explained [video]
IML: Chapter 4 [online book]

Model agnostic methods

IML: Chapter 3 [online book]
SHAP [software]

Model specific methods

Axiomatic Attribution for Deep Networks - Integrated Gradients [paper]
Attention is not Explanation [paper]
Attention is not not Explanation [paper]
A Benchmark for Interpretability Methods in Deep Neural Networks [paper]
Sanity Checks for Saliency Maps [paper]

Example based methods

IML: Chapter 6: Example-Based Explanations [online book]

Neural representation methods

Feature Visualization [article]
Multimodal Neurons in Artificial Neural Networks more recent feature visualization work [article]
SVCCA, PWCCA [papers]
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) [paper]
What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties [paper]

Other methods

Beyond Accuracy: Behavioral Testing of NLP models with CheckList [paper]


The Mythos of Model Interpretability [paper]
Towards falsifiable interpretability research [paper]


Your feedback is welcome at the Issues page.