4. Data Processing for Visualization

Topics and Notes
  1. Data management task for data visualization
    • a. Merging and joining relational data sets
    • b. Subsetting a data set by selecting rows and columns
  2. Important base R commands for data management
    • merge(), which(), and R object accessors
  3. Dplyr commands
    • Mutating Joins and mutate()
  4. Tidy data and code: joins and pipe operator (%>%)
  5. [Class Notes] Data Management for DataViz HTML PDF RMD SOURCE
  6. [Class Notes] Text Data Processing for DataViz HTML RMD SOURCE
Assignments
  1. Use the principles of data visualization introduced in week #2 to complete this week's assignment. The elements you are expected to consider are
    • a. Marks and channels: point shapes, size, colors, etc.
    • b. Legends and annotations (text and images): font size and face, colors, positions, etc.
    • c. Title and tick mark labels: font size and face, size, colors
  2. Create a subset from the penguin data set (the one used in the last assignment) that satisfies the following conditions
    • a. Delete all records with at least one missing component
    • b. Include Adelie penguins and Gentoo penguins from the Biscoe and Torgersen islands.
    • c. Include only penguins with body_mass_g less than 5000 grams but more than 3500 grams.
    • d. Rescale body_mass_g by dividing 4000 and rename it as BMI = body_mass_g / 4000.
    • e. Exclude variables X(observation ID), sex, year, and body_mass_g from the above subset.
    • f. Based on the above-resulting data set,
      • 1). Create a scatter plot of Bill_length_mm and Flipper_length_mm.
      • 2). Use different colors to indicate the species of penguins.
      • 3). Make sure the point size is proportional to the body mass index (BMI)
      • 4). (optional)Place a regression line for each individual species of penguin (make sure the color of the species-specific regression line should be identical to the color of the points).
      • 5). Write a paragraph to compare the relationship between the two variables across the species.
  3. Assignment Due: Wednesday, 11:30 PM

Copyright © 2019- C. Peng. Last updated: