```{r setup, include=FALSE}
options(repos = list(CRAN="http://cran.rstudio.com/"))
if (!require("tidyverse")) {
install.packages("tidyverse")
library(tidyverse)
}
if (!require("knitr")) {
install.packages("knitr")
library(knitr)
}
if (!require("cowplot")) {
install.packages("cowplot")
library(cowplot)
}
if (!require("latex2exp")) {
install.packages("latex2exp")
library(latex2exp)
}
if (!require("plotly")) {
install.packages("plotly")
library(plotly)
}
if (!require("gapminder")) {
install.packages("gapminder")
library(gapminder)
}
if (!require("png")) {
install.packages("png")
library("png")
}
if (!require("RCurl")) {
install.packages("RCurl")
library("RCurl")
}
if (!require("colourpicker")) {
install.packages("colourpicker")
library("colourpicker")
}
if (!require("gganimate")) {
install.packages("gganimate")
library("gganimate")
}
if (!require("gifski")) {
install.packages("gifski")
library("gifski")
}
if (!require("magick")) {
install.packages("magick")
library("magick")
}
if (!require("grDevices")) {
install.packages("grDevices")
library("grDevices")
}
if (!require("jpeg")) {
install.packages("jpeg")
library("jpeg")
}
if (!require("ggridges")) {
install.packages("ggridges")
library("ggridges")
}
if (!require("plyr")) {
install.packages("plyr")
library("plyr")
}
if (!require("ggiraph")) {
install.packages("ggiraph")
library("ggiraph")
}
if (!require("highcharter")) {
install.packages("highcharter")
library("highcharter")
}
if (!require("forecast")) {
install.packages("forecast")
library("forecast")
}
##
knitr::opts_chunk$set(echo = TRUE,
warning = FALSE,
result = TRUE,
message = FALSE,
comment = NA)
```
\
\
# {.tabset .tabset-fade .tabset-pills}
## plotly
There are two main ways to create a `plotly` object: either by transforming a ggplot2 object into a plotly object via `ggplotly()` or by directly initializing a plotly object with `plot_ly()`/`plot_geo()`/`plot_mapbox()`. Plotly has a rich and complex set of features. The most common features are:
* Tooltip “hover” info
* Zoom in and out of graphs
* Users can export graphs as an image
* Integrating multiple graphs
* Template hover info
* Animations and moving graphics
One can feed a `ggplot` to `plotly` to render ggplot via `plotly` by using `ggplotly()` - a wrapper of `ggplot`. Compared to the base R plotting function `plot()`, `plot_ly()` is more technical and poorly documented. However, the following factors may make `plotly` the best option:
* graphs presented in a digital/online format
* users interact with the graph
* more customizable than ggplot
* rendering graphics in a higher resolution
In this note, we introduce the basic statistical graphics using the `plotly` package. `plotly` graphics automatically contain interactive elements that allow users to modify, explore, and experience the visualized data in new ways.
The coding effort is similar to that of SAS ODS graphics. To use `plot_ly()`, we need to install (if not done) and load the `plotly` package. We use the well-known iris data set in the following plots. A nice `plotly` cheat sheet can be found at
Sepal Length: %{x}',
'
Petal Length: %{marker.size:,}',
'
Petal Width: %{customdata}',
'
Species: %{hovertext}',
"
Petal Width: ", Petal.Width,
"
Species: ", Species),
# Show the species in the hover text
## using the following hovertemplate() to add the information of the
## Two numerical variables to the hover text.
### Use the following hover template to display more information
hovertemplate = paste('Sepal Width: %{y}',
'
Sepal Length: %{x}',
'
%{text}'),
alpha = 0.6,
marker = list(size = ~Petal.Length, sizeref = .05, sizemode = 'area' ),
type = "scatter",
mode = "markers",
## graphic size
width = 700,
height = 500
) %>%
layout(
### Title
title =list(text = "Sepal Length vs Sepal Width",
font = list(family = "Times New Roman", # HTML font family
size = 18,
color = "red")),
### legend
legend = list(title = list(text = 'species',
font = list(family = "Courier New",
size = 14,
color = "green")),
bgcolor = "ivory",
bordercolor = "navy",
groupclick = "togglegroup", # one of "toggleitem" AND "togglegroup".
orientation = "v" # Sets the orientation of the legend.
),
## margin of the plot
margin = list(
b = 100,
l = 100,
t = 100,
r = 50
),
## Background
plot_bgcolor ='#f7f7f7',
## Axes labels
xaxis = list(
title=list(text = 'Sepal Length',
font = list(family = 'Arial')),
zerolinecolor = 'red',
zerolinewidth = 2,
gridcolor = 'white'),
yaxis = list(
title=list(text = 'Sepal Width',
font = list(family = 'Arial')),
zerolinecolor = 'purple',
zerolinewidth = 2,
gridcolor = 'white'),
## annotations
annotations = list(
x = 0.7, # between 0 and 1. 0 = left, 1 = right
y = 1.5, # between 0 and 1, 0 = bottom, 1 = top
font = list(size = 12,
color = "darkred"),
text = "The point size is proportional to the sepal length",
xref = "paper", # "container" spans the entire `width` of the
# lot. "paper" refers to the width of the
# plotting area only. yref = "paper",
# same as xref.
xanchor = "center", # horizontal alignment with respect to its x position
yanchor = "bottom", # similar to xanchor
showarrow = FALSE)
)
```
We also write a theme just like we did in the regular ggplot. The following is an example.
```{r}
myPlotlyLayout <- function(anyObjName){ # anyString is required initial argument.
# it can be any string a,b,c, .........
layout(anyObjName,
### Title
title =list(text = "Sepal Length vs Sepal Width",
font = list(family = "Times New Roman", # HTML font family
size = 18,
color = "red")),
### legend
legend = list(title = list(text = 'species',
font = list(family = "Courier New",
size = 14,
color = "green")),
bgcolor = "ivory",
bordercolor = "navy",
groupclick = "togglegroup", # one of "toggleitem" AND "togglegroup".
orientation = "v" # Sets the orientation of the legend.
),
## margin of the plot
margin = list(
b = 120,
l = 50,
t = 120,
r = 50
),
## Background
plot_bgcolor ='#f7f7f7',
## Axes labels
xaxis = list(
title=list(text = 'Sepal Length',
font = list(family = 'Arial')),
zerolinecolor = 'red',
zerolinewidth = 2,
gridcolor = 'white'),
yaxis = list(
title=list(text = 'Sepal Width',
font = list(family = 'Arial')),
zerolinecolor = 'purple',
zerolinewidth = 2,
gridcolor = 'white'),
## annotations
annotations = list(
x = 0.7, # between 0 and 1. 0 = left, 1 = right
y = 0.9, # between 0 and 1, 0 = bottom, 1 = top
font = list(size = 12,
color = "darkred"),
text = "The point size is proportional to the sepal length",
xref = "paper", # "container" spans the entire `width` of the plot.
# "paper" refers to the width of the plotting area only.
yref = "paper", # same as xref
xanchor = "center", # horizontal alignment with respect to its x position
yanchor = "bottom", # similar to xanchor
showarrow = FALSE
)
)
}
```
```{r, fig.align='center', fig.width=8, fig.height=8}
plot_ly(
data = iris,
x = ~Sepal.Length, # Horizontal axis
y = ~Sepal.Width, # Vertical axis
color = ~factor(Species), # must be a numeric factor
text = ~Species, # show the species in the hover text
## using the following hovertemplate() to add the information of the
## Two numerical variables to the hover text.
hovertemplate = paste('Sepal Width: %{y}',
'
Sepal Length: %{x}',
'
%{text}'),
alpha = 0.9,
marker = list(size = ~Petal.Length, sizeref = .05, sizemode = 'area' ),
type = "scatter",
mode = "markers",
## graphic size
width = 700,
height = 500) %>% myPlotlyLayout()
```
### External Images for plotly Charts
As we did in the base R and ggplot, we illustrate how to add images to plotly charts: inserting an image and setting an image background.
**Inserting Images to `plotly` Charts**
The following example shows how to use layout function to insert an external image to a plotly scatter plot. Comparing the steps of inserting an external image to the base R and ggplot, it is relatively straightforward and flexible to perform the same task in plotly. See the comments in the code to place the image in an appropriate location.
```{r}
plot_ly(
data = iris,
x = ~Sepal.Length, # Horizontal axis
y = ~Sepal.Width, # Vertical axis
customdata = ~Petal.Width,
color = ~factor(Species), # must be a numeric factor
hovertext = ~Species, # show the species in the hover text
hoverlabel = ~Petal.Width,
####
marker = list(size = ~Petal.Length, sizeref = .05, sizemode = 'area'),
#
alpha = 0.9,
type = "scatter",
mode = "markers",
## using the following hovertemplate() to add the information of the
## two numerical variable to the hover text.
hovertemplate = paste('Sepal Width: %{y}',
'
Sepal Length: %{x}',
'
Petal Length: %{marker.size:,}',
'
Petal Width: %{customdata}',
'
Species: %{hovertext}',
"
Sepal Length: %{x}',
'
Petal Length: %{marker.size:,}',
'
Petal Width: %{customdata}',
'
Species: %{hovertext}',
"
Continent:", continent,
"
Year:", year,
"
LifeExp:", lifeExp,
"
Pop:", pop,
"
gdpPerCap:", gdpPercap),
hoverinfo = "text",
type = 'scatter',
mode = 'markers'
)
fig <- fig %>% layout(
xaxis = list(
type = "log"
)
)
fig
```
### Rendering A GGPLOT with `ggplotly`
We can also render a ggplot using ggplotly to bring interactivity to the plot. The next is a customary theme to lay out ggplots.
```{r}
myplot.theme_new <- function() {
theme(
#ggplot margins
plot.margin = margin(t = 50, # Top margin
r = 30, # Right margin
b = 30, # Bottom margin
l = 30), # Left margin
## ggplot titles
plot.title = element_text(face = "bold",
size = 12,
family = "sans",
color = "navy",
hjust = 0.5,
margin=margin(0,0,30,0)), # left(0),right(1)
# add border 1)
panel.border = element_rect(colour = NA,
fill = NA,
linetype = 2),
# color background 2)
panel.background = element_rect(fill = "#f6f6f6"),
# modify grid 3)
panel.grid.major.x = element_line(colour = 'white',
linetype = 3,
size = 0.5),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(colour = 'white',
linetype = 3,
size = 0.5),
panel.grid.minor.y = element_blank(),
# modify text, axis, and color 4) and 5)
axis.text = element_text(colour = "navy",
#face = "italic",
size = 7,
#family = "Times New Roman"
),
axis.title = element_text(colour = "navy",
size = 7,
#family = "Times New Roman"
),
axis.ticks = element_line(colour = "navy"),
# legend at the bottom 6)
legend.position = "bottom",
legend.key.size = unit(0.6, 'cm'), #change legend key size
legend.key.height = unit(0.6, 'cm'), #change legend key height
legend.key.width = unit(0.6, 'cm'), #change legend key width
#legend.title = element_text(size=8), #change legend title font size
legend.title=element_blank(), # remove all legend titles
legend.key = element_rect(fill = "white"),
#####
legend.text = element_text(size=8)) #change legend text font size
}
```
The following plot uses the above theme and passes the correlation coefficient to the annotated text.
```{r, fig.align='center', fig.width=6, fig.height=5, message = FALSE, warning = FALSE}
p <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
#aes(color = factor(Species)) +
aes(label = Species, label1 = Petal.Length, label2 = Petal.Width) +
## The labels in the above aes() will be part of the hover text.
geom_point(size = iris$Petal.Length, alpha = 0.7) +
stat_smooth(method = lm, se=FALSE, size = 0.5) + # add a linear regression line
#scale_color_manual(values=c("dodgerblue4", "darkolivegreen4", "darkred")) +
labs(
x = "Sepal Length",
y = "Sepal Width",
title = "Association between Sepal Length and Width") +
myplot.theme_new() +
annotate(geom="text" ,
x=6.8,
y=2,
label=paste("The Pearson correlation coefficient r = ",
round(cor(iris$Sepal.Length, iris$Sepal.Width),3)),
size = 2,
color = "navy") +
coord_fixed(1) ## This changes the aspect ratio of the graph
ggplotly(p)
```
```{r, fig.align='center', fig.width=6, fig.height=5, message = FALSE, warning = FALSE}
p <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
# aes(color = factor(Species)) +
# to add more information about the variables in the data set
# use labels to denote the variable names inside the function aes()
aes(label=Species, label2=Petal.Length, label3=Petal.Width) +
geom_point(size = iris$Petal.Width, alpha = 0.7) +
stat_smooth(method = lm, se=FALSE, size = 0.3) +
#scale_color_manual(values=c("dodgerblue4", "darkolivegreen4", "darkred")) +
labs(
x = "Sepal Length",
y = "Sepal Width",
title = "Association between Sepal Length and Width") +
myplot.theme_new() +
annotate(geom="text" ,
x=6.8,
y=2,
label=paste("The Pearson correlation coefficient r = ",
round(cor(iris$Sepal.Length, iris$Sepal.Width),3)),
size = 2,
color = "navy") +
coord_fixed(1) ## This changes the aspect ratio of the graph
ggplotly(p)
```
**Remark**: It turns that `ggplotly` cannot display colors due to its recent updates. Hope that this issue will be fixed soon.
## Bar & Pie Chart
### Barplot
We will create a summarized data set to make bar plots. We define a data set to store the mean of sepal length and sepal width by species using the `dyplr` and `tidyr` approaches.
```{r}
barplotdata = aggregate(iris[,1:4], by = list(iris$Species), FUN = mean)
kable(head(barplotdata))
```
Next, we draw a group bar chart.
```{r}
plot_ly(
data = barplotdata,
x = ~Group.1,
y = ~Sepal.Length,
type = "bar",
name = "sepal.length.avg",
## graphic size
width = 700,
height = 400) %>%
add_trace(y=~Sepal.Width, name = "sepal.width.avg") %>%
add_trace(y=~Petal.Length, name = "petal.length.avg") %>%
add_trace(y=~Petal.Width, name = "petal.width.avg") %>%
layout( yaxis = list(title ="Mean"),
xaxis = list(title = "Species"),
title = "Group Means of Iris attributes",
## margin of the plot
margin = list(
b = 50,
l = 100,
t = 120,
r = 50
))
```
### Pie Chart
We first define a subset from the iris data by filtering out observations with a sepal length of less than 5. The pie chart will be created to see the distribution of species in the subset of the iris data. Keep in mind that the pie chart is constructed based on a frequency table.
```{r}
# define a working data set
subiris <- iris[iris$Sepal.Length > 5,5]
## Create a frequency table in the form of the data frame.
piedata = data.frame(cate =as.vector(unique(subiris)),
freq = as.vector(table(subiris)))
# define a color vector
colors <- c('rgb(211,94,96)', 'rgb(128,133,133)', 'rgb(144,103,167)')
# make a pie chart
plot_ly(piedata,
labels = ~cate,
values = ~freq,
type = 'pie',
textposition = 'inside',
textinfo = 'label + percent',
insidetextfont = list(color = '#FFFFFF'),
#hoverinfo = 'text',
marker = list(colors = colors,
line = list(color = '#FFFFFF', width = 1)),
#The 'pull' attribute can also be used to create space between the sectors
showlegend = TRUE) %>%
layout(title = 'Distribution of Species',
xaxis = list(showgrid = FALSE, zeroline = FALSE,
showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE,
showticklabels = FALSE),
## margin of the plot
margin = list(
b = 50,
l = 100,
t = 120,
r = 50
))
```
## Histogram & Desnity
Histograms and density curves are used to display the distribution of numerical random variables. When comparing the distributions of different random variables, we can overlay the histograms or density curves.
### Comparing Distributions Using Histograms
We can overlay histograms to compare the distributions of multiple random variables.
```{r}
plot_ly(
data = iris,
x = ~ Sepal.Length,
type = "histogram",
nbinsx = 10,
name = "sepal.length",
alpha = .5,
marker = list(line = list(color = "darkgray", width = 2)) ) %>%
## Adding additional histograms and stacking them
add_histogram(x = ~Sepal.Width,
name = "sepal.width", nbinsx = 10, alpha = 0.5,
marker = list(line = list(color = "darkgray", width = 2))) %>%
add_histogram(x = ~Petal.Length,
name = "petal.length",nbinsx = 10, alpha = 0.5,
marker = list(line = list(color = "darkgray", width = 2))) %>%
add_histogram(x = ~Petal.Width,
name = "petal.width",nbinsx = 10, alpha = 0.5,
marker = list(line = list(color = "darkgray", width = 2))) %>%
layout(barmode = "overlay",
title = "Histogram of Iris Attribute",
xaxis = list(title = "Iris Attributes",
zeroline = TRUE),
yaxis = list(title = "Count",
zeroline =TRUE),
## margin of the plot
margin = list(
b = 50,
l = 100,
t = 120,
r = 50
))
```
The issue is that the above overlaid histograms cannot be easy to distinguish when comparing more than two distributions in general. The ridgeline histogram can help in general. The following is an example of ridgeline histograms.
```{r fig.align='center', fig.width=5, fig.height=5}
ggplot(iris, aes(x = Sepal.Length, y = Species, group = Species, fill = Species)) +
geom_density_ridges(stat = "binline", bins = 20, scale = 2.2) +
scale_y_discrete(expand = c(0, 0)) +
scale_x_continuous(expand = c(0, 0)) +
coord_cartesian(clip = "off") +
theme_ridges()
```
### Density Curve
It is relatively easy to use density curves to compare multiple distributions. Assume that we want to compare the distribution of the sepal length of the tree iris flowers. One way to do this comparison is to plot the three estimated density curves.
```{r}
# define three densities
sepal.len.setosa <- iris[which(iris$Species == "setosa"),]
setosa <- density(sepal.len.setosa$Sepal.Length)
sepal.len.versicolor <- iris[which(iris$Species == "versicolor"),]
versicolor <- density(sepal.len.versicolor$Sepal.Length)
sepal.len.virginica <- iris[which(iris$Species == "virginica"),]
virginica <- density(sepal.len.virginica$Sepal.Length)
# plot density curves
fig <- plot_ly(x = ~virginica$x,
y = ~virginica$y,
type = 'scatter', #A character string specifying the trace type
mode = 'lines',
name = 'virginica',
fill = 'tozeroy') %>%
# adding more density curves
add_trace(x = ~versicolor$x,
y = ~versicolor$y,
name = 'versicolor',
fill = 'tozeroy') %>%
add_trace(x = ~setosa$x,
y = ~setosa$y,
name = 'setosa',
fill = 'tozeroy') %>%
layout(xaxis = list(title = 'Sepal Length'),
yaxis = list(title = 'Density'))
fig
```
The above overlaid density plots (with a certain level of transparency) are relatively easy to visualize.
```{r fig.align='center', fig.width=5, fig.height=3}
ridgeDensity <- ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_density_ridges() +
geom_density_interactive(aes(tooltip = interaction(Sepal.Length, Species),
data_id = interaction(Sepal.Length, Species)),
size = 1, hover_nearest = TRUE)
ridgeDensity
# girafe(ggobj = ridgeDensity)
```
**Note**: `ridgeline` plots do not work well with `ggplotly` to bring interactivity to the plots. There are some workarounds, but none is good enough for professional presentation.
## Boxplot
Drawing a boxplot is straightforward in `plotly`.
```{r}
plot_ly(
data = iris,
y = ~ Sepal.Length,
x = ~Species,
type = "box",
color = ~Species,
boxpoints = "all",
boxmean = TRUE,
showlegend = FALSE ) %>%
layout(title = "Histogram of Iris Attribute",
xaxis = list(title = "Species",
zeroline = TRUE),
yaxis = list(title = "Sepal Length",
zeroline =TRUE))
```
The non-interactive ggplot boxplot is given by
```{r}
summarized.iris = iris %>% select(-Species) %>%
pivot_longer(everything())
g.iris = ggplot(summarized.iris, aes(x=name,y=value, fill=name)) +
geom_boxplot() +
labs(
x = "Measure Types",
y = "Numerical Measures",
title = "Association between Sepal Length and Width") +
myplot.theme_new()
###
g.iris
```
`ggplotly` adds interactivity to the plot, but cannot add colors in the moment.
```{r}
summarized.iris = iris %>% select(-Species) %>%
pivot_longer(everything())
g.iris = ggplot(summarized.iris, aes(x = name, y = value)) +
geom_boxplot() +
labs(
x = "Measure Types",
y = "Numerical Measures",
title = "Association between Sepal Length and Width") +
myplot.theme_new()
###
ggplotly(g.iris)
```
## Serial Plot
Visualizing time series seems to be relatively easier since the objective is to inspect the pattern such as trend, seasonality, special shits, etc. to assist in model identification, such as determining the best length of the history of your time series data for time series forecasting, types of exponential smoothing, order of differencing, MA and AR in ARIMA framework, etc.
```{r fig.align='center', fig.width=6, fig.height=4}
stock <- read.csv('https://raw.githubusercontent.com/pengdsci/sta553.html/main/data/finance-charts-apple.csv')
##
fig <- plot_ly(stock, type = 'scatter', mode = 'lines') %>%
add_trace(x = ~Date, y = ~AAPL.High) %>%
layout(showlegend = F,
title='Time Series with Rangeslider',
xaxis = list(rangeslider = list(visible = T))) %>%
layout(xaxis = list(zerolinecolor = 'blue',
zerolinewidth = 2,
gridcolor = '#ffffff'),
yaxis = list(zerolinecolor = '#ffffff',
zerolinewidth = 2,
gridcolor = '#fff'),
plot_bgcolor='#e5ecf6', width = 800, height = 400)
fig
```
There are also other libraries one can use to produce interactive serial plots.
```{r}
# This plot uses the plot function: hccharh() and hcaes() in the library `hicharter`
hc <-stock %>%
hchart(
"line",
hcaes(x = Date, y = AAPL.High)
)
hc
```
The following interactive serial plot also included forecasted values and the forcasting confidence band.
```{r}
appl.high = stock$AAPL.High
# n= length(appl.high)
# plot(1:n, appl.high, type = 'l')
x <- forecast(ets(appl.high), h = 48)
hc <- hchart(x)
hc
```
## Plotly Maps
Several map libraries are available in R. In this example, we use the `plot_geo()` function from `plotly` to plot on a map.
```{r}
## preparing data
poc <- read_csv("https://raw.githubusercontent.com/pengdsci/sta553.html/main/data/POC.csv")[,c(7,8,9, 17)]
poc.site <- poc[poc$POC == 1,]
# geo styling
geostyle <- list(scope = 'usa',
projection = list(type = 'albers usa'),
showland = TRUE,
landcolor = toRGB("lightblue"),
subunitcolor = toRGB("purple"),
countrycolor = toRGB("navy"),
countrywidth = 0.75,
subunitwidth = 0.5
)
## plotting map
fig <- plot_geo(poc.site, lat = ~ycoord, lon = ~xcoord) %>%
add_markers(text = ~ SITE_DESCRIPTION,
color = "red",
symbol = "circle",
size = I(10),
hoverinfo = "text" ) %>%
layout( title = 'POC Risk Sites', geo = geostyle)
fig
```
## Conclusion
This note focuses on using `plotly` library and its dependencies to create various interactive plots. However, `plotly` is only one such library that can produce interactive graphics. There are several other commonly used libraries with different strengths. Here are a few of them
**Data integration**. Collect raw data and turn it into clean, analytics-ready information by performing data replication, ingestion, and transformation. Then store it in a data lake or data warehouse.
**Goal definition**. Define the business objective you’re trying to achieve and the data insights you seek. For example, are you trying to optimize a production process or track the ROI of your marketing efforts?
**Visualization design**. Design begins with selecting KPIs and types of graphs, charts, and maps that best tell your story. Keeping your visualizations clean and simple will help users understand and work with the data.
**Collaboration and sharing**. Allow all approved users to explore the data freely to uncover their own insights. Your software should allow users to embed your visualizations in other applications and to engage with them on their mobile devices.