Three Mind-Expanding Books for Causal and Non-Causal Data Scientists to Read in 2024
Hint: Knowing something about the data generating process will almost always put your ahead of the pack.
The Data Science Revolution was largely driven by the idea the we can look at the data, find patterns and leverage them to our benefit.
These ideas were fueled by the rising hopes that growing computational resources and unprecedented data availability will allow us to automate our decisions, scientific discovery and business analyses.
This turned out to work. At least to an extent.
A famous investor and mathematician, Jim Simons was able to successfully exploit predictive techniques in his investment strategies. Neural networks powered (and still do) some of the (partially) autonomous vehicles.
But there’s also a second, less visible part to this story.
Jim Simons also lost a lot of money using the predictive paradigm, and autonomous vehicles often fail when facing even slightly unusual conditions.
The three books we discuss in this blog post have one thing in common.
They all show how understanding the data generating process (rather than just looking at the patterns in the data) can help us make better decisions and enrich our understanding of the world.
Each does it differently.
1. The Book of Why: The New Science of Cause and Effect
This book, written by Judea Pearl and Dana Mackenzie is an absolute classic when it comes to causality and causal inference. It has been eye-opening to an entire generation of data scientists, researchers and practitioners.
Pearl shows the “why” behind “why” — why it is important to ask why questions and why it’s critical that we understand which methods to use in order to answer them (hint: understanding the structure of the data generating process is crucial).
Most of my podcast’s guests either started their journey into causality with this book or read it later in their career.
Highly recommended to anyone interested in improving their data skills.
🟡 “The Book of Why” by Judea Pearl & Dana Mackenzie (print, Kindle, audiobook)
2. “Antifragile: Things That Gain from Disorder”
From cherry-picking to linear models and lack of understanding of fat-tailed distributions, Nassim Taleb is a fierce critic of common practices in science and industry.
One of the theses in the book is that trying to control randomness in complex systems — although might seem beneficial in the short term — can badly backfire in the longer run.
In the book, Taleb shares his belief that talking about causes and effects might not be meaningful in case of complex non-linear and potentially cyclic systems.
I am not convinced by his pessimistic position here. We know that meaningful interventions in dynamical systems are possible, but we need to know what we’re doing (Naftali Weinberger explains it here).
What Taleb calls “naive interventions” can lead to dramatic, unwanted consequences.
Another important concept in the book are fat and long-tailed distributions that can easily derail any traditional learning algorithm trained on finite-sized samples.
Fat and long tails are critically important in most complex areas from finance to autonomous driving.
Great and mind-expanding read.
🟡 “Antifragile” by Nassim Nicholas Taleb (print, Kindle, audiobook)
3. “Chaos: Making a New Science”
In the age of machine learning, we got used to the thought that if something is unpredictable, it clearly lacks structure — and maybe collecting measurements of more variables could make it more predictable.
The truth is that chaotic systems can produce chaotic behavior very systematically, based on a set of (possibly) very simple rules.
James Gleick’s book is an excellent introduction to chaos theory, dynamical systems and complexity. His engaging style and love for a good story (from Oppenheimer to Lorenz and much more) make it a great read!
If you were ever interested in fractals, emergence, complexity or chaos — this is a must-read.
Each of these books brings a unique perspective to the table, and each presents an idea that questions a broadly accepted status quo.
Taken together, these books are a great inspiration to question the assumptions that our contemporary data culture takes for granted.
Sometimes just fitting models to data is not enough to answer the questions that are the most important to us.
Understanding this is power.
I hope these mind-expanding books will inspire you as much as they inspired me (to say the least, I wouldn’t have written my book if I hadn’t read “The Book of Why”)
Finally, I‘d love to learn from you — what are your favorite mind-expanding books?
PS: The best way to let me know if you liked this story is by clapping. You can clap more than once 👏🏼👏🏼👏🏼
This article contains affiliate links — if you decide to purchase a book using one of them a small part of the revenue will help me create more free content for you.