Doing Data Science Book Review


Doing Data Science, written by Cathy O’Neil and Rachel Schutt is a welcome sight in the male-dominated tech industry. Here are two people that are eloquent and interesting. And unfortunately, it is for this very reason that the book failed to live up to my expectations.

I love Cathy’s no-nonsense down-to-earth style, but most of this book isn’t written by Cathy! It is written in the third person about lectures given by other people! This wasn’t made clear to me before I read the book so I was disappointed to find it disjointed and scattered. It is hard to keep data science consistent at the best of times, without complicating by adding multiple authors.

This meant that there were both overlaps and gaps and each chapter was completely different from the last. Sometimes I would consider one chapter to be beginner-level then it would jump into a chapter that is what I would consider advanced.

Also, this book was heavily statistics-focused. Some sections had a lot of math and all of the code was presented in R. I don’t mind that too much, but this is probably not the right fit for people coming from Engineering.

But this book is different to others. It aims to give you a sense of what “real” data science is like. Like the vast majority of the time is not spent modelling. This is what I emphasise on my training courses, so I connected with the authors on this point.

There were some real insights too. I particularly like:

“It’s good to be smart, but being able to learn fast is even better: run experiments quickly to learn quickly.”

And:

“What model would be best for educating the data scientist?”

That’s a really interesting thought. It flips the idea of modelling on its head. You can only build the “best” model when you completely understand the domain. So instead of trying to model to improve performance, model to improve your knowledge of the domain.

I would recommend this book because there is some insightful content and it is wonderfully different to all of the other list-all-the-things style machine learning books. It also acknowledges that data science in business is different to what you read on Kaggle, et al. which I completely agree with. I just wish people would stop doing the mult-author thing. It never works well.

So don’t expect to go back to work with some new ideas for your code; read it on holiday instead.