Topic 1 Introduction

This E-coursepack (a.k.a. E-Pack) is a self-contained homegrown Ebook that contains all topics covered in current STA321 at WCU.

1.1 Why This E-Pack?

Since this is an advanced-topic course that covers three major topics in linear regression modeling, generalized linear regression modeling, and time series modeling. These topics are typically covered in three different textbook. This E-pack contains all topics and are delivered using parametric and non-parametric methods.

1.2 Components Statistical Reports

Since is a project-based modeling class. The assignments are building-blocks of about 3 projects that cover linear regression, generalized linear regression and time series. Every will use data sets that are real-world or close to the real-world data for all projects. All statistical reports must have the following key components.

A. Introduction

Provide some background on the problem. This includes the motivations and objectives of the analysis.

B. Materials

Some information about the data should be described here. For example, methods of data collection, variable names, and definitions, potential data challenges, etc. You could use subsections to organize your work.

C. Methodology

Describe all the methods (including justifications for using the methods) you used to gather and analyze the data here. You need to provide extensive details so that anyone can replicate your results.

D. Results and Conclusions

Show your audience all your results and conclusions with justifications. Write this section in a way that enables a non-statistician to understand the content. Be very specific.

E. General Discussion

Talk about results (with justifications) and link them to real-world implications. Pay attention to whether the research questions were well addressed.

F. References (if any)

Everything you used in the analysis including notes and blogs from the internet, textbook, journal articles, etc.

G. Appendices (if any)

Additional output tables and graphics that important but not fundamental to the report should be placed in this section.

1.3 Basics of RStudio

This brief note will introduce the basics of Rstudio, R Markdown, and R.

  • RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging, and workspace management.

  • R Markdown is a file format for making dynamic documents with R. An R Markdown document is written in markdown (an easy-to-write plain text format) and contains chunks of embedded R code and the output generated from the R code. This note is written in R Markdown. This is also a tutorial showing how to use R Markdown to write an R Markdown report. – RStudio documentation.

  • R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

1.3.1 RStudio GUI

The RStudio interface consists of several windows. I insert an image of a regular RStudio GUI.

The GUI of RStudio

Figure 1.1: The GUI of RStudio

1.3.1.1 Console

We can type commands directly into the console, or write in a text file, and then send the command to the console. It is convenient to use the console if your task involves one line of code. Otherwise, we should always use an editor to write code and then run the code in the Console.

1.3.1.2 Source Editor

Generally, we will want to write programs longer than a few lines. The Source Editor can help you open, edit and execute these programs.

1.3.1.3 Environment Window

The Environment window shows the objects (i.e., data frames, arrays, values, and functions) in the environment (workspace). We can see the descriptive information such as types as the dimension of the objects in your environment. We also choose a data source from the environment to view in the source window like a spreadsheet.

1.3.1.4 System and Graphic files

The Files tab has a navigable file manager, just like the file system on your operating system. The Plot tab is where the graphics you create will appear. The Packages tab shows you the packages that are installed and those that can be installed (more on this just now). The Help tab allows you to search the R documentation for help and is where the help appears when you ask for it from the Console.

1.3.2 RMarkdown

An R Markdown document is a text-based file format that allows you to include both descriptive text, code blocks, and code output. It can be converted to other types of files such as PDF, HTML, and WORD that can include code, plots, outputs generated from the code chunks.

1.3.2.1 Code Chunk

In R Markdown, we can embed R code in the code chunk defined by the symbol ```{} and closed by ```. The symbol `, also called backquote or backtick, can be found on the top left corner of the standard keyboard as shown in the following.

The location of backquote on the standard keyboard

Figure 1.2: The location of backquote on the standard keyboard

There are two code chunks: executable and non-executable chunks. The following code chunk is non-executable since is no argument specified in the {}.

Non-executable code chunk.

Figure 1.3: Non-executable code chunk.

This is a code chunk

To write a code chunk that will be executed, we can simply put the letter r inside the curly bracket. If the code chunk is executable, you will the green arrow on the top-right corner of the chunk.

Executable code chunk.

Figure 1.4: Executable code chunk.

We can define R objects with and without any outputs. In the above R code chunk, we define an R object under the name x and assign value 5 to x (the first line of the code). We also request an output that prints the value of x. The above executable code chunk gives output [1] 5 in the Markdown document. The same output in the knit output files is in a box with a transparent background in the form ## [1] 5.

x = 5
x
## [1] 5

We can also use an argument in the code chunk to control the output. For example, the following code chunk will be evaluated when kitting to other formats of files. But we can still click the green arrow inside the code chunk to evaluate the code.

x = 5
x

1.3.2.2 Graphics Generated from R Code Chunks

In the previous sub-sections, we include images from external image files. In fact, can use the R function to generate graphics (other than interacting with plots, etc.) in the markdown file & knit. For instance, we can generate the following image from R and include it in the Markdown document and the knitter output files.

Unlike the way of including an external image in to the R code chunk in which we use chunk option out.width=“80%” or out.height = “60%”, out.width=“80%” to specify the dimension of the displayed image, The graphics generated from R need a different option to specify the dimension. The dimension of the following graph is specified by {r, fig.align="center", fig.height=5, fig.width=5, fig.cap= "R Generated Graph"}.

data(iris)
plot(iris[,-5])
R Generated Graph

Figure 1.5: R Generated Graph

1.4 Collaborative Platforms

There are many platforms and technologies available for applied statisticians and data scientists. We will use RPubs (https://rpubs.com/) and GitHub Repository (https://github.com/) in this class.

1.4.1 RPubs

RPubs is a free web server provided by RStudio (recently changed to Posit) that you can use it to publish you analytic reports and code and share with your peers and friends worldwide.

To use this resource, you need to sign up an account with RPubs first. Onece you set up your RPubs account, you can then create reports via RMarkdown and publish them on RPubs in the HTML format. You can share your work with people by providing the hyperlink to them.

In this class, all preject reports are required to be published on RPubs so I can read your work directly from RPubs. You need to submit the links to you reports via D2L dropbox.

1.4.2 GitHub Repository

GitHub is an online software development platform. It’s used for storing, tracking, and collaborating on software projects.

To use it, you need to create an account. After you set your GitHub account, you can upload your files (text, code, photos, videos, etc) to the repository. You can also use GitHub to host your personal web page (static).

In this class, all data sets you are going to use in your assignments and project are required to uploaded to your specific repository so you can read your data sets directly from GitHub repository.

1.5 Github

1.5.1 What is Github?

GitHub is a social networking site for programmers to share their code. Many companies and organizations use it to facilitate project management and collaboration. It is the most prominent source code host, with over 60 million new repositories.

Most importantly, it is free. We can also use this resource to host web pages. Many images and data sets that I used are stored on GitHub. You need to register a GitHub account (https://github.com/login) to use create GitHub repositories and download and install Git for version control (version control is not required for this course, but is is extremely important in practice). The following Figure 1.5 shows the GitHub front page.

GitHub front page.

Figure 1.6: GitHub front page.

1.5.2 Getting Started with GitHub

We will use screenshots to demonstrate how to create a repository, folders, and files.

  1. After you logged into your account, you click the “continue for free” button located at the bottom of the following page (screenshot, Figure 1.6)
The first page after logging-on.

Figure 1.7: The first page after logging-on.

  1. Now you see your Github front page. Click the green button “create repository” on the left panel. Our first repository is called “sta553” (Figure 1.7)
Starting creating repository.

Figure 1.8: Starting creating repository.

  1. To organize files in the repository sta553, We want folders for different files. To create a folder under sta553, click the hyperlink `creating a new file (Figure 1.8)
Creating new folders to organize your files.

Figure 1.9: Creating new folders to organize your files.

  1. The first folder to create is called the data folder which will be used to store data files. After typing “data/”, a new box appears under the “data” folder, type the first file name - readme, and the content of the file (see the screenshot). In the end, click the green button “Commit new file” to complete the creation of the first folder in the repository data (Figure 1.9).
Creating new files in a folder created earlier.

Figure 1.10: Creating new files in a folder created earlier.

  1. To load the data file to the data folder, we click the drop-down menu on the top right corner and select upload files (Figure 1.10)
Creating another new folder.

Figure 1.11: Creating another new folder.

  1. To create other folders under sta553, we click Creating New File, and we can create a new folder image similarly (Figure 1.11).
Creating new folders for specialized files such as image files.

Figure 1.12: Creating new folders for specialized files such as image files.

  1. To create a new repository, Click the drop-down menu on the top right corner and select New repository to create a new repository (Figure 1.12).
Creating new repositories for different projects

Figure 1.13: Creating new repositories for different projects