Think Stats


Think Stats is almost a book about data analysis, which is a good thing. The main issue with statistics books is that they can be a bit dry. Often they are wrapped as popular science books which are fun, but not technical enough to use in your day job.

Think Stats punches right in the middle. You learn about basic statistics, but the chapters are laid out like a data analysis book. This means it fits really nicely into a data science syllabus. In other words, it has tools and techniques you can actually use.

It spends a long time talking about histograms and probability distributions. Then moves on to hypothesis testing, the gold standard of all science. I like how it reiterates that many standard tests are only valid under normality. Towards the end it reaches linear regression and simple time series analysis. So this should be considered a beginners book.

The only thing that I wasn’t a fan of was the amount of code. This isn’t quite as bad as some other books, because the code is well abstracted, but I continue to be confused why tech books feel the need to fill their pages full of code and accompanying explanations. It’s a waste of trees and detracts from the fundamentals. Paper is not the right place for code; git is for code.

If you want an we’ll written introduction to statistics, with an emphasis on doing something useful with your data, then this is what I would recommend.