STA 321 E-pack: Advanced Statistics
West Chester University
Topic 1 Introduction
This E-coursepack (a.k.a. E-Pack) is a self-contained homegrown Ebook that contains all topics covered in current STA321 at WCU.
1.1 Why This E-Pack?
Since this is an advanced-topic course that covers three major topics in linear regression modeling, generalized linear regression modeling, and time series modeling. These topics are typically covered in three different textbook. This E-pack contains all topics and are delivered using parametric and non-parametric methods.
1.2 Components Statistical Reports
Since is a project-based modeling class. The assignments are building-blocks of about 3 projects that cover linear regression, generalized linear regression and time series. Every will use data sets that are real-world or close to the real-world data for all projects. All statistical reports must have the following key components.
A. Introduction
Provide some background on the problem. This includes the motivations and objectives of the analysis.
B. Materials
Some information about the data should be described here. For example, methods of data collection, variable names, and definitions, potential data challenges, etc. You could use subsections to organize your work.
C. Methodology
Describe all the methods (including justifications for using the methods) you used to gather and analyze the data here. You need to provide extensive details so that anyone can replicate your results.
D. Results and Conclusions
Show your audience all your results and conclusions with justifications. Write this section in a way that enables a non-statistician to understand the content. Be very specific.
E. General Discussion
Talk about results (with justifications) and link them to real-world implications. Pay attention to whether the research questions were well addressed.
1.3 Basics of RStudio
This brief note will introduce the basics of Rstudio, R Markdown, and R.
RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging, and workspace management.
R Markdown is a file format for making dynamic documents with R. An R Markdown document is written in markdown (an easy-to-write plain text format) and contains chunks of embedded R code and the output generated from the R code. This note is written in R Markdown. This is also a tutorial showing how to use R Markdown to write an R Markdown report. – RStudio documentation.
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
1.3.1 RStudio GUI
The RStudio interface consists of several windows. I insert an image of a regular RStudio GUI.
1.3.1.1 Console
We can type commands directly into the console, or write in a text file, and then send the command to the console. It is convenient to use the console if your task involves one line of code. Otherwise, we should always use an editor to write code and then run the code in the Console.
1.3.1.2 Source Editor
Generally, we will want to write programs longer than a few lines. The Source Editor can help you open, edit and execute these programs.
1.3.1.3 Environment Window
The Environment window shows the objects (i.e., data frames, arrays, values, and functions) in the environment (workspace). We can see the descriptive information such as types as the dimension of the objects in your environment. We also choose a data source from the environment to view in the source window like a spreadsheet.
1.3.1.4 System and Graphic files
The Files tab has a navigable file manager, just like the file system on your operating system. The Plot tab is where the graphics you create will appear. The Packages tab shows you the packages that are installed and those that can be installed (more on this just now). The Help tab allows you to search the R documentation for help and is where the help appears when you ask for it from the Console.
1.3.2 RMarkdown
An R Markdown document is a text-based file format that allows you to include both descriptive text, code blocks, and code output. It can be converted to other types of files such as PDF, HTML, and WORD that can include code, plots, outputs generated from the code chunks.
1.3.2.1 Code Chunk
In R Markdown, we can embed R code in the code chunk defined by the symbol ```{}
and closed by ```
. The symbol `, also called backquote or backtick, can be found on the top left corner of the standard keyboard as shown in the following.
There are two code chunks: executable and non-executable chunks. The following code chunk is non-executable since is no argument specified in the {}
.
This is a code chunk
To write a code chunk that will be executed, we can simply put the letter r
inside the curly bracket. If the code chunk is executable, you will the green arrow on the top-right corner of the chunk.
We can define R objects with and without any outputs. In the above R code chunk, we define an R object under the name x
and assign value 5 to x
(the first line of the code). We also request an output that prints the value of x
. The above executable code chunk gives output [1] 5
in the Markdown document. The same output in the knit output files is in a box with a transparent background in the form ## [1] 5
.
## [1] 5
We can also use an argument in the code chunk to control the output. For example, the following code chunk will be evaluated when kitting to other formats of files. But we can still click the green arrow inside the code chunk to evaluate the code.
1.3.2.2 Graphics Generated from R Code Chunks
In the previous sub-sections, we include images from external image files. In fact, can use the R function to generate graphics (other than interacting with plots, etc.) in the markdown file & knit. For instance, we can generate the following image from R and include it in the Markdown document and the knitter output files.
Unlike the way of including an external image in to the R code chunk in which we use chunk option out.width=“80%” or out.height = “60%”, out.width=“80%” to specify the dimension of the displayed image, The graphics generated from R need a different option to specify the dimension. The dimension of the following graph is specified by {r, fig.align="center", fig.height=5, fig.width=5, fig.cap= "R Generated Graph"}
.
1.4 Collaborative Platforms
There are many platforms and technologies available for applied statisticians and data scientists. We will use RPubs (https://rpubs.com/) and GitHub Repository (https://github.com/) in this class.
1.4.1 RPubs
RPubs is a free web server provided by RStudio (recently changed to Posit) that you can use it to publish you analytic reports and code and share with your peers and friends worldwide.
To use this resource, you need to sign up an account with RPubs first. Onece you set up your RPubs account, you can then create reports via RMarkdown and publish them on RPubs in the HTML format. You can share your work with people by providing the hyperlink to them.
In this class, all preject reports are required to be published on RPubs so I can read your work directly from RPubs. You need to submit the links to you reports via D2L dropbox.
1.4.2 GitHub Repository
GitHub is an online software development platform. It’s used for storing, tracking, and collaborating on software projects.
To use it, you need to create an account. After you set your GitHub account, you can upload your files (text, code, photos, videos, etc) to the repository. You can also use GitHub to host your personal web page (static).
In this class, all data sets you are going to use in your assignments and project are required to uploaded to your specific repository so you can read your data sets directly from GitHub repository.
1.5 Github
1.5.1 What is Github?
GitHub is a social networking site for programmers to share their code. Many companies and organizations use it to facilitate project management and collaboration. It is the most prominent source code host, with over 60 million new repositories.
Most importantly, it is free. We can also use this resource to host web pages. Many images and data sets that I used are stored on GitHub. You need to register a GitHub account (https://github.com/login) to use create GitHub repositories and download and install Git for version control (version control is not required for this course, but is is extremely important in practice). The following Figure 1.5 shows the GitHub front page.
1.5.2 Getting Started with GitHub
We will use screenshots to demonstrate how to create a repository, folders, and files.
- After you logged into your account, you click the “continue for free” button located at the bottom of the following page (screenshot, Figure 1.6)
- Now you see your Github front page. Click the green button “create repository” on the left panel. Our first repository is called “sta553” (Figure 1.7)
- To organize files in the repository
sta553
, We want folders for different files. To create a folder understa553
, click the hyperlink `creating a new file (Figure 1.8)
- The first folder to create is called the
data
folder which will be used to store data files. After typing “data/”, a new box appears under the “data” folder, type the first file name - readme, and the content of the file (see the screenshot). In the end, click the green button “Commit new file” to complete the creation of the first folder in the repositorydata
(Figure 1.9).
- To load the data file to the
data
folder, we click the drop-down menu on the top right corner and selectupload files
(Figure 1.10)
- To create other folders under
sta553
, we clickCreating New File
, and we can create a new folderimage
similarly (Figure 1.11).
- To create a new repository, Click the drop-down menu on the top right corner and select
New repository
to create a new repository (Figure 1.12).