This note introduces software programs and platforms that could be used in this course.
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R is an integrated suite of software facilities for data manipulation, calculation, and graphical display. It includes
RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging, and workspace management.
There are two versions of RStudio: RStudio Desktop and RStudio Server. Both versions have free open source and commercial editions. We use the free open-source edition of RStudio Desktop that has the following features:
R and RStudio are two distinctly different applications that serve different purposes. R is a programming language used for statistical computing while RStudio uses the R language to develop statistical programs.
R and RStudio are not separate versions of the same program and cannot be substituted for one another. R may be used without RStudio, but RStudio may not be used without R.
Tableau is a powerful and fastest-growing data visualization tool used in the Business Intelligence Industry. The great thing about Tableau software is that it doesn’t require any technical or any kind of programming skills to operate. Tableau suite has different products that cause confusion to new users.
Tableau Desktop has a rich feature set and allows you to code and customize reports. It is not free (actually pretty expensive)!
Tableau Public creates workbooks that cannot be saved locally, in turn, they should be saved to Tableau’s public server
in the cloud which can be viewed and accessed by anyone. You need to download it and install it on your computer to design workbooks offline and then save them to Tableau’s public server
. It is totally free!
Tableau Server is specifically used to share the workbooks, and visualizations that are created in the Tableau Desktop application across the organization. It is NOT free! . However, the public server is free.
Tableau Online has all the similar functionalities of the Tableau Public, but the data is stored on servers hosted in the cloud which are maintained by the Tableau group. That means you design workbooks on Tableau’s public server. It is also free!
Tableau Reader is a free tool which allows you to view the workbooks and visualizations created using Tableau Desktop or Tableau Public.
First of all, you need to sign up for an account with RPubs if you don’t have one. Otherwise, sign in to your existing RPubs account. The following two hyperlink buttons will bring you to the appropriate website.
Your deed to install a recent version of You’ll need R
itself, RStudio
, and the knitr
package on your machine.
In RStudio, create a new R Markdown document by choosing File
| New
| R Markdown
.
Click the Knit HTML
button in the doc toolbar to preview your document.
In the preview window, click button.
GitHub is a social networking site for programmers to share their code. Many companies and organizations use it to facilitate project management and collaboration. It is the most prominent source code host, with over 60 million new repositories.
Most importantly, it is free. We can also use this resource to host web pages. Many images and data sets that I used are stored on Github.
You can use the following two buttons to sign up for an account with Github or sign in to an existing Github account.
We will use screenshots to demonstrate how to create a repository, folders, and files.
sta553
, We want folders for different files. To create a folder under sta553
, click the hyperlink `creating a new filedata
folder which will be used to store data files. After typing “data/”, a new box appears under the “data” folder, type the first file name - readme, and the content of the file (see the screenshot). In the end, click the green button “Commit new file” to complete the creation of the first folder in the repository data
.data
folder, we click the drop-down menu on the top right corner and select upload files
sta553
, we click Creating New File
, and we can create a new folder image
similarly.New repository
to create a new repository.SAS OnDemand provides free data management and data analysis tools. The advantage of SAS OnDemand is that it does not require any installation and it runs on the cloud via the internet and process data by connecting to the SAS server in the cloud. In other words, your computer is only used as a monitor since it does not use any resources (memory and CPU) of your computer.
Click Access
to enter into the SAS OnDemand login page.
If you have already created your SAS Profile, use the email or user ID and the password to log into the SAS OnDemand page.
If you don’t have a SAS profile, click the link Don't have a SAS profile?
, and you will have the following pop-up dialogue box. Click Create profile
, then you will see a pop-up sign-up page. You then follow the direction to create your SAS profile.
Provide your profile information to log into the OnDemand page, you will see the link to the SAS Studio user interface and your account information as well.
Once you created a SAS profile, you will have 5 GB of free storage.
In the Applications
tab, click SAS Studio
, and you see the SAS Studio user interface on a separate page (it may take a little bit of time to initialize your account if you use it for the first time).
The above screenshot was taken from my SAS course webpage. For those who learned SAS using the classical SAS, you will see SAS Studio is much more convenient and easier to use.
SAS Studio (Academic OnDemand) is installed on SAS servers hosted in the Microsoft Azure Cloud. Although SAS claims that your assigned storage is private and secured, it is suggested to avoid uploading sensitive data
to your private storage on the SAS server since SAS does not release the level of security for the storage.
The following libraries will be used throughout this class.
Ggplot2 is a system for creating charts based on the Grammar of Graphics. It proved to be one of the most powerful R libraries for visualization.
plotly
is an online platform for data visualization in R (also available in Python). This package creates interactive web-based plots using plotly.js
library. Plotly gives users an opportunity to interact with graphs, change their scale and point out the necessary record. The library also supports graph hovering. Moreover, one can easily add Plotly in knitr/R Markdown or Shiny apps.
Leaflet is a well-known package based on JavaScript libraries for interactive maps. It is widely used for mapping and working with the customization and design of interactive maps. Besides, Leaflet provides an opportunity to make these maps mobile-friendly.
ggmap
, map
, dygraph
,