2 Exploring data

This unit focuses on data visualization and data wrangling. Specifically we cover fundamentals of data and data visualization, confounding variables, and Simpson’s paradox as well as the concept of tidy data, data import, data cleaning, and data curation. We end the unit with web scraping and introduce the idea of iteration in preparation for the next unit. Also in this unit students are introduced to the toolkit: R, RStudio, R Markdown, Git, and GitHub.

2.1 Visualising data

Unit 2 - Deck 1: Data and visualisation

Unit 2 - Deck 2: Visualising data with ggplot2

::: {.slide-deck} Unit 2 - Deck 3: Visualising numerical data

Unit 2 - Deck 4: Visualising categorical data

2.2 Wrangling and tidying data

Unit 2 - Deck 5: Tidy data

JSS :: Tidy data

Unit 2 - Deck 6: Grammar of data wrangling

Unit 2 - Deck 7: Working with a single data frame

Unit 2 - Deck 8: Working with multiple data frames

Unit 2 - Deck 9: Tidying data

2.3 Importing and recoding data

Unit 2 - Deck 10: Data types

Unit 2 - Deck 11: Data classes

Unit 2 - Deck 12: Importing data

Unit 2 - Deck 13: Recoding data

2.4 Communicating data science results effectively

Unit 2 - Deck 14: Tips for effective data visualization

Unit 2 - Deck 15: Scientific studies and confounding

Unit 2 - Deck 16: Simpson’s paradox

Unit 2 - Deck 17: Doing data science

2.5 Web scraping and programming

Unit 2 - Deck 18: Web scraping

Unit 2 - Deck 19: Scraping top 250 movies on IMDB

Unit 2 - Deck 20: Web scraping considerations

Unit 2 - Deck 21: Functions

Unit 2 - Deck 22: Iteration