There are probably thousands upon thousands of tutorials, articles, videos, and blog posts on all things data science on the internet now. Yet I’m still a big fan of books.
Men who have made these discoveries before us are not our masters, but our guides.
So let books also be your guide in your data science journey along with the tutorials, articles, and videos. And to help you get started or to add to your collection, below is a list of some great, free books on different aspects of data science.
The main thing that differentiates a data scientist from a data analyst or statistician is their ability to write code. It’s no secret that the two biggest languages for data science is Python and R. Both have their respective advantages and disadvantages and it’s not going to hurt if you learn one over the other.
Automate the Boring Stuff is a great resource for beginners with Python programming or programmers who have years of experience as there is so many useful examples in this book that can be used. I have found this especially helpful as someone newer to the Python language but not new to programming in general.
The R Programming wikibook is a great resource for starting to learn the R programming language. This does have a lot more in terms of statistics and math, but the whole reason for R in the first place is to have a language to help do those calculations.
The opposite of the above, a data scientist knows more statistics than the average programmer. Statistics is a huge field in itself, so just a basic knowledge of it can set you apart from the rest.
OpenIntro Statistics is the textbook if you take Coursera’s Statistics with R specialization. I’ve been going through this book as I’ve been taking the classes and have found it very helpful as another resource to my understanding of statistics.
Think Stats is another introductory statistics book, yet they introduce the statistics – not with formulas – but with Python code. For Bayesian statistics, there’s also a companion piece – Think Bayes.
There are quite a lot of data science books out there already. However, these two are among the best I’ve come across.
The Python Data Science Handbook by Jake Vanderplas is a great reference from getting started with Jupyter notebooks, understanding data with pandas, visualizations with matplotlib, and even some machine learning with scikit-learn. This book goes through all aspects of what a data scientist might due during their day.
R for Data Science is similar to the python book above, but goes through these things with R and different R packages such as dplyr for analyzing data and ggplot2 for visualizations. Written by Hadley Wickham who wrote most of the R packages used for data science.
Introduction to Statistical Learning with Applications in R sounds like a statistics book; actually it is a statistics book. However, this book covers all of the machine learning algorithms you’ll come across. After this one, feel free to dive into it’s big brother – Elements of Statistical Learning.
Hands-on Machine Learning with scikit-learn and TensorFlow is better if you get the printed book which is the best I’ve yet to read on machine learning. This is only the Jupyter notebooks and it doesn’t include all of the text. But, you can still get a good idea of the code and examples the book offers.