5. GGPLOT Revisited

Topics and Notes
  1. Foundamentals of ggplot HTML RMD
    • a. Components of ggplot()
    • b. Basic Statistical graphics with ggplot()
    • c. Charts for single variable
    • d. Charts for two variables
  2. Guidelines of Univariate and Bivariate Plots HTML RMD SOURCE
Assignments
  1. Review the notes for this week [both ggplot() and gganimate() functions]
  2. Prepare a single data set based on the data sets in C. World Life Expectancy Data Project Data Set Page
    • a. Reshape data set: Income Per Person to make a longitudinal data such that the resulting data set has three columns: country, year, and income.
    • b. Do the same for Life Expectancy in Years so that the resulting data set has three columns: country, year, and life expectancy.
    • c. Merge/join the above two longitudinal data sets to make a new data set, under name LifeExpIncom that has variables: country, year, lifeExp, and income.
    • d. Merge LifeExpIncom with country region so that the final data set has information about income, life expectancy, and country region.
    • e. Merge the previous resulting data set with population size so that the final data set has information about income, life expectancy, population size, and country region. The correct data set should be similar to this data set CSV
    • f. Upload a copy of the merged data to your GitHub repostory.
  3. Submission requirements: Prepare an HTML using RMD and my suggested YAML that contains the following components
    • a. Documenting the above data preparation steps.
    • b. Summarizing the analytic data sets.
      • 1). Number of variables and observations in the resulting data set.
      • 2). Create a subset of the above resulting longitudinal data set that contains only the data of the year 2000 - name it data2000.
    • c. A ggplot: Consider 4 variables income, life-expectancy, population-size, and region in the data2000 and create a ggplot that meets the following requirements:
      • 1). Choose any two of the three numerical variables to make a scatter plot.
      • 2). Use the third numerical variable to adjust the sizes of the points (the point size is proportional to the value of the third numerical variable).
      • 3). Use variable region to color the points (using one of the three colorblind-friendly palettes introduced in the lecture note).
    • d. A narrative that describes the patterns observed in the above plot.
  4. Assignment Due: Wednesday, 11:30 PM

Copyright © 2019- C. Peng. Last updated: