Week 01: Course outline and logistics


TOPICS: The following topics will be covered this week.
  • Logistics
    • Course logistics, coverage, and policies
    • Software tools: R, Tableau Public
    • Platforms and supplementary software: RStudio, Github, LaTex
    • Primary R Libraries such as ggplot2, tydyverse, shiny, etc.
  • Introduction
    • Goals for visualization
    • Steps for Visualizing Data
    • Data Foundations
  • Basics of R and RStudio
    • R, RStudio, and relevant packages
    • R Markdown for communications
  • Class Notes
ASSIGNMENTS: Software Installation and Registration
  • Software Installation
    • Required Software: R, RStudio, LaTex(MikTex), Tableau Public
    • Required Registrations: Github, shinyapp.io, RPubs
  • No D2L Written Submission Due This Week!

Week 02: R Programming Review and Base R Graphics


TOPICS: The following topics will be covered this week.
  • Basics of R Programming
    • Data types
    • Control and loops
    • R functions
  • Base R Graphics
    • Base R graphic functions
    • Basic charts
  • Class Notes
ASSIGNMENT: Make a scatter plot based on the penguin data set
  • The information of the data set
  • Steps for creating the visualization using the base plot functions
    • Download the penguins data set and then upload it to your Github data repository.
    • Find an image of a penguin and upload the image to the Github repository.
    • Read the data set to R from the Github data repository directly.
    • Open an R Markdown document. You are encouraged to use my template for all assignments.
    • Choose numerical variables (Bill_length_mm and Flipper_length_mm) from the penguins data set to make a scatter plot and color the points based on the species.
    • The point size should be proportional to body_mass_g of the corresponding penguin.
    • insert the image of the penguin you uploaded to the Github repository to the scatter plot.
    • add a legend to enhance the readability of your plot.
    • make sure the axes of your plot are appropriately labeled. The title of plot should reflect the content in the plot.
    • knit the RMD to HTML format and upload your HTML file to your Github repository.
  • Submission Requirements
  • D2L Submission Due: 11:59 PM, 2/13/2022 Please submit your work through D2L dropbox.

Week 03: Foundations and Ethics of Data Visualization


TOPICS: The following topics will be covered this week.
  • Foundations of Data Visualization
    • Principles of data visualization
    • Building blocks of data visualization: Marks and Channels
    • Design elements in visualizations: visual encoding and color scheme
    • Strategies for making good visualization
  • Ethics of data visualization
    • Basics of scientific ethics
    • Avoiding
  • Class Notes
ASSIGNMENTS: Finalize the visualization based on the penguin data.
  • Review the notes for this week.
  • Week #2 Assignment Due Sunday, 2/13/2022

Week 04: Data Management for Data Visualizations


TOPICS: The following topics will be covered this week.
  • Data management task for data visualization
    • Merging and joining relational data sets
    • Subsetting a data set by selecting rows and columns
  • Important base R commands for data management
    • merge(), which(), and R object accessors
  • Dplyr commands
    • Mutating Joins and mutate()
  • Tidy data and code: joins and pipe operator (%>%)
  • Class Notes
ASSIGNMENTS: This assignment #2 has two components.
  • Use the principles of data visualization introduced in week #4 to refine the plot based on the iris data set. The elements you are expected to consider are
    • Marks and channels: point shapes, size, colors, etc.
    • Legends and annotations (text and images): font size and face, colors, positions, etc.
    • Title and tick mark labels: font size and face, size, colors
  • Create a subset from the penguin data set that satisfies the following conditions
    • Delete all records with at least one missing component
    • Include Adelie penguins and Gentoo penguins from the Biscoe and Torgersen islands.
    • Include only penguins with body_mass_g less than 5000 grams but more than 3500 grams.
    • Rescale body_mass_g by dividing 4000 and rename it as BMI = body_mass_g / 4000.
    • Exclude variables X(observation ID), sex, year, and body_mass_g from the above subset.
    • Based on the above resulting data set,
      • Create the same scatter plot as the one that you are asked to refine in component 1.
      • Write a paragraph to compare the relationship between the two variables across the species.
  • Week #2 Assignment Due: Sunday, 2/27/2022

Week 05: ggplot revisited


TOPICS: The following topics will be covered this week.
  • Foundations of Data Visualization
    • Components of ggplot()
    • Basic Statistical graphics with ggplot()
    • Charts for single variable
    • Charts for two variables
  • Class Notes
ASSIGNMENTS: Finalize assignment #2 and prepare a data set for assignment #3.
  • Review the notes for this week [both ggplot() and gganimate() functions]
  • Prepare a single data set based on data set #12 on Project Data Set Page
    • Reshape data set: Income Per Person to make a longitudinal data such that the resulting data set has three columns: country, year, and income.
    • Do the same for Life Expectancy in Years
    • so that the resulting data set has three columns: country, year, and life expectancy.
    • Merge/join the above two longitudinal data sets to make a new data set, under name LifeExpIncom that has variables: country, year, lifeExp, and income.
    • Merge LifeExpIncom with country region so that the final data set has information about income, life expectancy, and country region.
    • Merge the previous resulting data set with population size so that the final data set has information about income, life expectancy, population size, and country region.
  • Assignment #2 Assignment Due: Sunday, 2/27/2022

Week 06: Interactive Plots with plotly()


TOPICS: The following topics will be covered this week.
  • Foundations of Data Visualization
    • plotly package and syntax
    • interactive statistical charts
    • Plotly map – a simple example of the interactive map
  • Class Notes
    • Interactive Plot with plotly() [updated version] | HTML| RMD|
ASSIGNMENTS: Assignment #3: Based on the life expectancy data set you prepared last week, select an appropriate plot(s) discussed this week to create a visualization displaying the relationship between income and life expectancy.
  • Make an interactive scatter plot to display the association between life expectancy and income for the year 2015. [required]
    • Set the point size to be proportional to the population size
    • Use different colors for different countries.
    • Choose an appropriate transparency level so that overlapped points can be viewed.
    • Choose an appropriate color to highlight the point boundary so that partially overlapped points can be easily distinguished.
    • Include the country name and population size in the hover text.
  • Make an animated scatter plot that shows pattern of change in the relationship between life expectancy and income over the years. [required]
    • Set the point size to be proportional to the population size
    • Use different colors for different regions.
    • Choose an appropriate transparency level so that overlapped points can be viewed.
    • Choose an appropriate color to highlight the point boundary so that partially overlapped points can be easily distinguished.
  • Assignment #3 Assignment Due: Sunday, 3/13/2022

Week 07: Visualizing Spatial Information Using Maps


TOPICS: The following topics will be covered this week.
  • R Maps
    • Basic types of maps: choropleth and scatter map
    • R map libraries
    • Choropleth map for aggregated data
    • Scatter map for individual-level data
  • Class Notes
    • Visualizing spatial information | HTML| RMD|
ASSIGNMENTS: Assignment #3: Based on the life expectancy data set you prepared last week, select an appropriate plot(s) discussed this week to create a visualization displaying the relationship between income and life expectancy.
  • Make an interactive scatter plot to display the association between life expectancy and income for the year 2015. [required]
    • Set the point size to be proportional to the population size
    • Use different colors for different countries.
    • Choose an appropriate transparency level so that overlapped points can be viewed.
    • Choose an appropriate color to highlight the point boundary so that partially overlapped points can be easily distinguished.
    • Include the country name and population size in the hover text or popu-ups.
  • Make an animated scatter plot that shows pattern of change in the relationship between life expectancy and income over the years. [required]
    • Set the point size to be proportional to the population size
    • Use different colors for different regions.
    • Choose an appropriate transparency level so that overlapped points can be viewed.
    • Choose an appropriate color to highlight the point boundary so that partially overlapped points can be easily distinguished.
  • Choose an appropriate R map library to create an interactive map of the gas station data and show some information of each gas station on the map.
    • Gas Station Data Set
    • Take a random sample 500 gas stations from the US to plot on the map
    • The information to be included in the hover/popups: State, county, address and the zip code.


    Assignment #3 Assignment Due:Sunday, 3/13/2022

Week 08: Advanced Maps


TOPICS: The following topics will be covered this week.
  • More R Maps
    • Thematic Maps
    • Shapefiles: creating new shapfile and modifying existing shapfile
    • Case Studies: spatial data - aggregated and individual data
  • Class Notes
    • Visualizing spatial information [same as the last week's note] | HTML| RMD|
    • More on Shapefiles | HTML| RMD|
    • HTML Slidy Presentation | HTML| RMD| [Updated]
    • HTML Slidy Presentation (manually modified with a Tableau chart)| HTML| [Updated]
ASSIGNMENTS: Assignment #4: Create interactive maps to show the 2020 presidential election results.
Part I.
  • Data sets: | Presidential Election Data| FIPS to Geocode|
  • Data management tasks:
    • Extract on 2020 presidential election data: year
    • Only include Democrats and Republican votes: party
    • Include variables: state_po, county_name, county_fips, party, candidatevotes
    • Merge the above data with FIPS to Geocode Datausing the FIPS as the primary key.
    • Create an interactive choropleth map to display the presidential election results at county level using two different colors to represent the two parties. You can choose any R library to accomplish this part of the assignment.

    Assignment #4 Assignment Due:Sunday, 4/3/2022

Week 09: Basic Statistical Charts with Tableau


TOPICS: The following topics will be covered this week.
  • Introduction to Tableau: please install Tableau Public on your computer and sign up an account with Tableau Public to save your visualization.
  • Basic Statistical Charts
    • Bar chart, pie chart, donut chart
    • Histogram and density curves
    • Time series plot (line plot)
    • Scatter plot and buble plot
    • Maps and tree maps
  • Derived variables and Tableau built-in functions
  • Filters
  • Class Notes: Setting up a web age on Github [ Updated ]
ASSIGNMENTS: Assignment #4: Create interactive maps to show the 2020 presidential election results.
Part I.
  • Data sets: | Presidential Election Data| FIPS to Geocode|
  • Data management tasks:
    • Extract on 2020 presidential election data: year
    • Only include Democrats and Republican votes: party
    • Include variables: state_po, county_name, county_fips, party, candidatevotes
    • Merge the above data with FIPS to Geocode Datausing the FIPS as the primary key.
    • Create an interactive choropleth map to display the presidential election results at county level using two different colors to represent the two parties. You can choose any R library to accomplish this part of the assignment.

    Part II.
  • Use Tableau to create the map as described in Part I. of this assignment.
    • Create the map with Tableau and publish it on Tableau's publicc server.
    • Embed the map into RMarkdown using the IMG tag (hence embed it to the knitted HTML file).

    Assignment #4 Assignment Due:Sunday, 4/3/2022

Week 10: Inetractive Visualization and Dashboard with Tableau


TOPICS: The following topics will be covered this week.
ASSIGNMENTS: Assignment #5: Tableau Dashboard and Story Point.
  • Details are in the last section of the class note.
  • Assignment #5 Assignment Due:Sunday, 4/17/2022

    Week 11: Getting Started with Shiny Apps


    TOPICS: The following topics will be covered this week.
    ASSIGNMENTS: Assignment #6: R Shiny Apps.
  • Part I:
    Pick up a commonly used probability distribution such as to generate a set of random numbers and then make an appropriate visualization to show the distribution. It is dependent on the types of distribution your choose, the visualization can be any of basic charts such as histogram, boxplots, probability distribution histogram, etc. The shiny app should be similar to case-study 1 in this week's class note.
  • Assignment #6 Assignment Due: Wednesday, 5/04/2022

    Week 12: Data Analysis with R Shiny Apps


    TOPICS: The following topics will be covered this week.
    ASSIGNMENTS: Assignment #6: R Shiny Apps.
  • Part I:
    Pick up a commonly used probability distribution such as to generate a set of random numbers and then make an appropriate visualization to show the distribution. It is dependent on the types of distribution your choose, the visualization can be any of basic charts such as histogram, boxplots, probability distribution histogram, etc. The shiny app should be similar to case-study 1 in this week's class note.
  • Part II: Graphical Analysis of Penguin Data with the R Shiny App
    Perform an analysis of the penguin data that is similar to what I did for the iris data. There are two categorical variables in the penguin data: species and location. You use one or both of them to define filters.

  • Assignment #6 Assignment Due: Wednesday, 5/04/2022

    Week 13: Creating R Shiny Dashboards


    TOPICS: The following topics will be covered this week.
    ASSIGNMENTS: Assignment #6: R Shiny Apps.
  • Part I: Getting Started with Shiny - Simulation
    Pick up a commonly used probability distribution such as to generate a set of random numbers and then make an appropriate visualization to show the distribution. It is dependent on the types of distribution your choose, the visualization can be any of basic charts such as histogram, boxplots, probability distribution histogram, etc. The shiny app should be similar to case-study 1 in this week's class note.

  • Part II: Graphical Analysis of Penguin Data with the R Shiny App
    Perform an analysis of the penguin data that is similar to what I did for the iris data. There are two categorical variables in the penguin data: species and location. You use one or both of them to define filters.

  • Part III: Graphical Analysis of Penguin Data with the R Shiny Dashboard
    Create a shiny dashboard based the penguin data. I will not pose any restrictions in this dashboard design.


  • Assignment #6 Assignment Due: Wednesday, 5/04/2022

    Week 14: Shinydashboard and Multipage Flexdashboard


    The following topics are optional. The two brief notes are templates
  • The iris dashboard uses library shinydashboard. The RMD document illustrates how to design ui and write the server side code. RMD
  • A template of flexdashboard with multiple pages and a global sidebar for input controls. RMD

  • Week 15: Information about the Final Project


    Minimum Requirements
    • Choose a a publicly available data set that has at least two numerical and two categorical variables.
    • The sample size is at least equal to 50 times the number of variables you will use in the visualizations.
    • Create a shiny app with the following components (using a tabset design) that is similar to the one we created this week.
      • a tab that has an interactive plot displaying the relationship between two variables using any of the libraries we learned this semester. Note that information about an individual record should be displayed in a hover box.
      • a tab that has an interactive chart for the comparison of distributions of numerical variables or categorical variables. For example, charts could be histograms, bar charts, boxplots, pie charts, etc.
      • a tab that displays the regression results.
      • a tab that summarizes the inferential statistics in the regression model.
      • (optional). If your data set has geo-information, you can create an interactive map to show spatial patterns of your data.
    • You can develop the shiny app in either shiny IDE or using RMarkdown.
      • If you create your shiny app in the shiny IDE (single or two-file version), you need to publish the app on hinyapps.io and provide the link to the app.
      • If you use the RMD to create the app, you should also publish the app and the RMD on shinyapps.io.
      • You need to submit the code to the D2L drop box.
    • Data set must be available on your Github repository so that I can run your code on my computer without modifying your code.
    Grading Rubrics
    • Overall Design - the principles of data visualization
      • titles and labels
      • effective use of marks and channels
      • use of color schemes
      • high dimensional information - hover texts and popups
    • Functionalities
      • interactivity
      • responsiveness
      • reactivity (bonus)
      • user-friendly input controls
    • Aesthetic Appearance
      • simplicity and intuitiveness
      • consistency and clarity in using graphical features
      • color coding and highlighting
      • effective use of font size and color in the texts, labels, titles, etc.
      • scales of axes: avoiding distortion, disinformation, and misinformation.
    Bonus for Experienced Visual Designers
    You can develop a shiny dashboard using components in the minimum requirement. You can add as many nice features as possible to the dashboard.

    Project Due: Friday, 5/13/2022
  • Please note that no late submission will be accepted!

  • STA 533 Shiny Gallery
  • The gallery will go-live on Saturday, 5/14/2022. The link will be posted here once the site is available.