If you are excited about the world of data science and machine learning but have yet to fully dive in, this course will catapult your skills so that you can smoothly transition into it. Data exploration is the foundation for all good data analysis and where the vast majority of time is spent as a data scientist.
This bootcamp is designed to introduce you to data exploration using the Python programming language. Through this intense, weeklong program you will master the skills necessary to manipulate, visualize, and explore datasets to extract valuable insights.
Key Learning Areas
Python has become the de facto language of choice for data scientists in the corporate world. Python is easy to learn with concise and expressive syntax, and capable of powering any organization’s analytics requirements. The primary focus on this course will be on the powerful and popular Pandas library. To be an effective data scientist, you need to master both the syntax of the tools and the concepts that make a good analysis.
You will learn:
- How to write modern and idiomatic Pandas
- How to explore data in Jupyter Notebooks
- How to select subsets of data
- The split-apply-combine paradigm
- How to transform any dataset into a tidy one
- A systematic routine for doing exploratory data analysis
- Visualization with Matplotlib and Seaborn
- Applied machine learning with Scikit-Learn
- How to deliver an engaging data story
The ultimate goal of this course is to teach ‘end-to-end’ data analysis. Upon completion, you will be able to ingest any messy dataset, transform it so that it is tidy, detect and correct anomalies, make insightful and beautiful visualizations, run machine learning models, and deliver a final product.
The course uses the latest version of Python 3 and Pandas to teach both the syntax and concepts of performing data exploration. All material is contained in Jupyter Notebooks with over 200 exercises with detailed solutions.
Course Introduction and Setup
- A thorough pre-course assignment (a refresher on Python fundamentals)
- Installation of Anaconda
- Environment setup
- Introduction to Jupyter Notebooks
Introduction to Pandas
- Anatomy of a DataFrame and Series
- Subset selection
- Boolean indexing
- Arithmetic operations
- Common and simple methods
- Method chaining
- Automatic alignment of the index
- Grouping columns, aggregating columns, aggregating functions
- Complex grouping
- Custom aggregations
- Filtering, transforming, and applying when grouping
- What is tidy data?
- Learning melt, stack, pivot, unpivot
- Reshaping with help from index names
- Identifying the most common types of tidy data
- Tidying a variety of messy datasets
- Extracting data with regular expressions
Exploratory Data Analysis
- Object-oriented interface of Matplotlib
- Plotting with Pandas
- Plotting with Seaborn
- Developing a data analysis routine
- Creating a data dictionary
- Univariate vs Bivariate analysis
- Categorical vs Continuous data
- Feature engineering with string columns
- Detecting and handling outliers
- Developing a reproducible report
Machine Learning Preparation
- Completed after EDA
- Missing value imputation
- Encoding categorical variables
- Establishing a baseline
- Parsimonious modeling
- Cross Validation
- Applying mathematical transformations
- More feature engineering
- Scaling and normalization
- Data science take-home interview assignment
- EDA with HR Analytics
- Kaggle Competition
This course is for data analysts and data scientists who desire to learn how to programmatically explore, analyze, and model data with Python.
Attendees should have a basic understanding of the Python programming language. A thorough pre-course assignment will be provided for those who need to solidify the fundamentals of Python.