Topic 10 Interactive Statistics Graphics
This note is all about interactive plots. However, interactive plots cannot be rendered in the PDF and EPUB. Screenshots will be included in the PDF and EPUB versions of this eBook. The HTML version of this eBook will keep the interactivity of all graphics! The code that generated the corresponding plots is still included in all versions of this eBook.
10.1 Plotly
Plotly has a rich and complex set of features. The most common features are:
- Tooltip “hover” info
- Zoom in and out of graphs
- Users can export graphs as an image
- Integrating multiple graphs
- Template hover info
- Animations and moving graphics
One can feed a ggplot
to plotly
to render ggplot via plotly
. Compared to the base R plotting function plot()
, plot_ly()
is more technical and poorly documented. However, the following factors may make plotly
the best option:
- Graphs presented in a digital/online format
- Users interact with the graph
- more customizable than ggplot
- rendering graphics in a higher resolution
In this note, we introduce the basic statistical graphics using the plotly
package. plotly
graphics automatically contain interactive elements that allow users to modify, explore, and experience the visualized data in new ways.
The coding effort is similar to that of SAS ODS graphics. To use plot_ly()
, we need to install (if not done) and load the plotly
package. We use the well-known iris data set in the following plots. A nice plotly
cheat sheet can be found at https://github.com/pengdsci/sta553/blob/main/ref/r_plotly_cheat_sheet.pdf
10.2 ScatterPlot
10.2.1 The Default Plot
First, we make a simple interactive scatter plot using sepal length and width. We can view the information about the variables and color coding information in the hover text. The labels of axes and legend titles and labels are default.
10.2.2 Addiing Additional Information Through hovertemplate
We can also add additional information to the plot to enhance the interactivity of the plot. For example, we can (1) modify the point size using the value of a numerical variable; (2) add text to the hover text using the text
option to show the class label; (3) formulate the hover text using hovertemplate
option.
plot_ly(
data = iris,
x = ~Sepal.Length, # Horizontal axis
y = ~Sepal.Width, # Vertical axis
color = ~factor(Species), # must be a numeric factor
text = ~Species, # show the species in the hover text
## using the following hovertemplate() to add the information of the
## two numerical variable to the hover text.
hovertemplate = paste('<i><b>Sepal Width<b></i>: %{y}',
'<br><b>Sepal Length</b>: %{x}',
'<br><b>%{text}</b>'),
alpha = 0.9,
size = ~Sepal.Length,
type = "scatter",
mode = "markers")
10.2.3 Enhancing the Plot with Layout() Function
Titles and axis labels are important in any visualization, to include a meaningful title, informative labels, and annotations to the plotly plot, we can use the layout() function. The following code only gives you some design ideas you can use to enhance your plotly charts. The detailed list of configurations can be found on plotly’s reference page at https://plotly.com/r/reference/layout/
plot_ly(
data = iris,
x = ~Sepal.Length, # Horizontal axis
y = ~Sepal.Width, # Vertical axis
color = ~factor(Species), # must be a numeric factor
text = ~Species, # show the species in the hover text
## using the following hovertemplate() to add the information of the
## two numerical variable to the hover text.
hovertemplate = paste('<i><b>Sepal Width<b></i>: %{y}',
'<br><b>Sepal Length</b>: %{x}',
'<br><b>%{text}</b>'),
alpha = 0.9,
size = ~Sepal.Length,
type = "scatter",
mode = "markers"
) %>%
layout(
## graphic size
with = 700,
height = 700,
### Title
title =list(text = "Sepal Length vs Sepal Width",
font = list(family = "Times New Roman", # HTML font family
size = 18,
color = "red")),
### legend
legend = list(title = list(text = 'species',
font = list(family = "Courier New",
size = 14,
color = "green")),
bgcolor = "ivory",
bordercolor = "navy",
groupclick = "togglegroup", # one of "toggleitem" AND "togglegroup".
orientation = "v" # Sets the orientation of the legend.
),
## margin of the plot
margin = list(
b = 120,
l = 50,
t = 120,
r = 50
),
## Background
plot_bgcolor ='#f7f7f7',
## Axes labels
xaxis = list(
title=list(text = 'Sepal Length',
font = list(family = 'Arial')),
zerolinecolor = 'red',
zerolinewidth = 2,
gridcolor = 'white'),
yaxis = list(
title=list(text = 'Sepal Width',
font = list(family = 'Arial')),
zerolinecolor = 'purple',
zerolinewidth = 2,
gridcolor = 'white'),
## annotations
annotations = list(
x = 0.7, # between 0 and 1. 0 = left, 1 = right
y = 0.9, # between 0 and 1, 0 = bottom, 1 = top
font = list(size = 12,
color = "darkred"),
text = "The point size is proportional to the sepal length",
xref = "paper", # "container" spans the entire `width` of the plot.
# "paper" refers to the width of the plotting area only.
yref = "paper", # same as xref
xanchor = "center", # horizontal alignment with respect to its x position
yanchor = "bottom", # similar to xanchor
showarrow = FALSE
)
)
myPlotlyLayout <- function(){
layout(
## graphic size
with = 700,
height = 700,
### Title
title =list(text = "Sepal Length vs Sepal Width",
font = list(family = "Times New Roman", # HTML font family
size = 18,
color = "red")),
### legend
legend = list(title = list(text = 'species',
font = list(family = "Courier New",
size = 14,
color = "green")),
bgcolor = "ivory",
bordercolor = "navy",
groupclick = "togglegroup", # one of "toggleitem" AND "togglegroup".
orientation = "v" # Sets the orientation of the legend.
),
## margin of the plot
margin = list(
b = 120,
l = 50,
t = 120,
r = 50
),
## Background
plot_bgcolor ='#f7f7f7',
## Axes labels
xaxis = list(
title=list(text = 'Sepal Length',
font = list(family = 'Arial')),
zerolinecolor = 'red',
zerolinewidth = 2,
gridcolor = 'white'),
yaxis = list(
title=list(text = 'Sepal Width',
font = list(family = 'Arial')),
zerolinecolor = 'purple',
zerolinewidth = 2,
gridcolor = 'white'),
## annotations
annotations = list(
x = 0.7, # between 0 and 1. 0 = left, 1 = right
y = 0.9, # between 0 and 1, 0 = bottom, 1 = top
font = list(size = 12,
color = "darkred"),
text = "The point size is proportional to the sepal length",
xref = "paper", # "container" spans the entire `width` of the plot.
# "paper" refers to the width of the plotting area only.
yref = "paper", # same as xref
xanchor = "center", # horizontal alignment with respect to its x position
yanchor = "bottom", # similar to xanchor
showarrow = FALSE
)
)
}
plot_ly(
data = iris,
x = ~Sepal.Length, # Horizontal axis
y = ~Sepal.Width, # Vertical axis
color = ~factor(Species), # must be a numeric factor
text = ~Species, # show the species in the hover text
## using the following hovertemplate() to add the information of the
## two numerical variable to the hover text.
hovertemplate = paste('<i><b>Sepal Width<b></i>: %{y}',
'<br><b>Sepal Length</b>: %{x}',
'<br><b>%{text}</b>'),
alpha = 0.9,
size = ~Sepal.Length,
type = "scatter",
mode = "markers"
)
10.2.4 Rendering A GGPLOT with ggplotly
We can also render a ggplot in using ggplotly to bring interactivity to the plot.
myplot.theme_new <- function() {
theme(
#ggplot margins
plot.margin = margin(t = 50, # Top margin
r = 30, # Right margin
b = 30, # Bottom margin
l = 30), # Left margin
## ggplot titles
plot.title = element_text(face = "bold",
size = 12,
family = "sans",
color = "navy",
hjust = 0.5,
margin=margin(0,0,30,0)), # left(0),right(1)
# add border 1)
panel.border = element_rect(colour = NA,
fill = NA,
linetype = 2),
# color background 2)
panel.background = element_rect(fill = "#f6f6f6"),
# modify grid 3)
panel.grid.major.x = element_line(colour = 'white',
linetype = 3,
size = 0.5),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(colour = 'white',
linetype = 3,
size = 0.5),
panel.grid.minor.y = element_blank(),
# modify text, axis and colour 4) and 5)
axis.text = element_text(colour = "navy",
#face = "italic",
size = 7,
#family = "Times New Roman"
),
axis.title = element_text(colour = "navy",
size = 7,
#family = "Times New Roman"
),
axis.ticks = element_line(colour = "navy"),
# legend at the bottom 6)
legend.position = "bottom",
legend.key.size = unit(0.6, 'cm'), #change legend key size
legend.key.height = unit(0.6, 'cm'), #change legend key height
legend.key.width = unit(0.6, 'cm'), #change legend key width
#legend.title = element_text(size=8), #change legend title font size
legend.title=element_blank(), # remove all legend titles
legend.key = element_rect(fill = "white"),
#####
legend.text = element_text(size=8)) #change legend text font size
}
# Change histogram plot line colors by groups
p <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width,
color = factor(Species)), linetype = Species) +
geom_point(size = 2, alpha = 0.7) +
stat_smooth(method = lm, se=FALSE, size = 0.3) +
scale_color_manual(values=c("dodgerblue4", "darkolivegreen4", "darkorchid3")) +
labs(
x = "Sepal Length",
y = "Sepal Width",
title = "Association between Sepal Length and Width") +
myplot.theme_new() +
annotate(geom="text" ,
x=6.8,
y=2,
label=paste("The Pearson correlation coefficient r = ",
round(cor(iris$Sepal.Length, iris$Sepal.Width),3)),
size = 2,
color = "navy") +
coord_fixed(1) ## This changes the aspect ratio of the graph
ggplotly(p)
# Change histogram plot line colors by groups
p <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width,
color = factor(Species)), linetype = Species) +
# to add more information about the variables in the data set
# use labels to denote the variable names inside the function aes()
aes(label=Species, label2=Petal.Length, label3=Petal.Width) +
geom_point(size = 2, alpha = 0.7) +
stat_smooth(method = lm, se=FALSE, size = 0.3) +
scale_color_manual(values=c("dodgerblue4", "darkolivegreen4", "darkorchid3")) +
labs(
x = "Sepal Length",
y = "Sepal Width",
title = "Association between Sepal Length and Width") +
myplot.theme_new() +
annotate(geom="text" ,
x=6.8,
y=2,
label=paste("The Pearson correlation coefficient r = ",
round(cor(iris$Sepal.Length, iris$Sepal.Width),3)),
size = 2,
color = "navy") +
coord_fixed(1) ## This changes the aspect ratio of the graph
ggplotly(p)
10.3 Barplot
We will create a summarized data set to make bar plots. We define a data set to store the mean of sepal length and sepal width by species using the dyplr
and tidyr
approaches.
barplotdata <- iris %>%
group_by(Species) %>%
summarize(sepal.l.avg = mean(Sepal.Length),
sepal.w.avg = mean(Sepal.Width),
petal.l.avg = mean(Petal.Length),
petal.w.avg = mean(Petal.Width))
kable(head(barplotdata))
Species | sepal.l.avg | sepal.w.avg | petal.l.avg | petal.w.avg |
---|---|---|---|---|
setosa | 5.006 | 3.428 | 1.462 | 0.246 |
versicolor | 5.936 | 2.770 | 4.260 | 1.326 |
virginica | 6.588 | 2.974 | 5.552 | 2.026 |
Next, we draw a group bar chart.
plot_ly(
data = barplotdata,
x = ~Species,
y = ~sepal.l.avg,
type = "bar",
name = "sepal.len.avg" ) %>%
add_trace(y=~sepal.w.avg, name = "sepal.wid.avg") %>%
add_trace(y=~petal.l.avg, name = "petal.len.avg") %>%
add_trace(y=~petal.w.avg, name = "petal.wid.avg") %>%
layout( yaxis = list(title ="Mean"),
title = "Frequency distribution of Iris attributes")
10.4 Histogram
plot_ly(
data = iris,
x = ~ Sepal.Length,
type = "histogram",
nbinsx = 10,
name = "sepal.length",
alpha = .5,
marker = list(line = list(color = "darkgray", width = 2)) ) %>%
## adding additional histograms and stack them
add_histogram(x = ~Sepal.Width,
name = "sepal.width", nbinsx = 10, alpha = 0.5,
marker = list(line = list(color = "darkgray", width = 2))) %>%
add_histogram(x = ~Petal.Length,
name = "petal.length",nbinsx = 10, alpha = 0.5,
marker = list(line = list(color = "darkgray", width = 2))) %>%
add_histogram(x = ~Petal.Width,
name = "petal.lwidth",nbinsx = 10, alpha = 0.5,
marker = list(line = list(color = "darkgray", width = 2))) %>%
layout(barmode = "overlay",
title = "Histogram of Iris Attribute",
xaxis = list(title = "Iris Attributes",
zeroline = TRUE),
yaxis = list(title = "Count",
zeroline =TRUE))
10.6 Pie Chart
We first define a subset from the iris data by filtering out observations with a sepal length of less than 5. The pie chart will be created to see the distribution of species in the subset of the iris data. Keep in mind that the pie chart is constructed based on a frequency table.
# define a working data set
subiris <- iris[iris$Sepal.Length > 5,5]
## create a frequency table in the form of data frame.
piedata = data.frame(cate =as.vector(unique(subiris)),
freq = as.vector(table(subiris)))
# define a color vector
colors <- c('rgb(211,94,96)', 'rgb(128,133,133)', 'rgb(144,103,167)')
# make a pie chart
plot_ly(piedata, labels = ~cate, values = ~freq, type = 'pie',
textposition = 'inside',
textinfo = 'label + percent',
insidetextfont = list(color = '#FFFFFF'),
hoverinfo = 'text',
marker = list(colors = colors,
line = list(color = '#FFFFFF', width = 1)),
#The 'pull' attribute can also be used to create space between the sectors
showlegend = FALSE) %>%
layout(title = 'Distribution of Species',
xaxis = list(showgrid = FALSE, zeroline = FALSE,
showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE,
showticklabels = FALSE))
10.7 Density Curve
Assume that we want to compare the distribution of the sepal length of the tree iris flowers. One way to do this comparison is to plot the three estimated density curves.
# define three densities
sepal.len.setosa <- iris[which(iris$Species == "setosa"),]
setosa <- density(sepal.len.setosa$Sepal.Length)
sepal.len.versicolor <- iris[which(iris$Species == "versicolor"),]
versicolor <- density(sepal.len.versicolor$Sepal.Length)
sepal.len.virginica <- iris[which(iris$Species == "virginica"),]
virginica <- density(sepal.len.virginica$Sepal.Length)
# plot density curves
fig <- plot_ly(x = ~virginica$x, y = ~virginica$y,
type = 'scatter', mode = 'lines',
name = 'virginica',
fill = 'tozeroy') %>%
# adding more density curves
add_trace(x = ~versicolor$x, y = ~versicolor$y,
name = 'versicolor', fill = 'tozeroy') %>%
add_trace(x = ~setosa$x, y = ~setosa$y,
name = 'setosa', fill = 'tozeroy') %>%
layout(xaxis = list(title = 'Sepal Length'),
yaxis = list(title = 'Density'))
fig
10.8 Serial Plot
stock <- read.csv('https://raw.githubusercontent.com/pengdsci/sta553.html/main/data/finance-charts-apple.csv')
##
fig <- plot_ly(stock, type = 'scatter', mode = 'lines') %>%
add_trace(x = ~Date, y = ~AAPL.High) %>%
layout(showlegend = F,
title='Time Series with Rangeslider',
xaxis = list(rangeslider = list(visible = T))) %>%
layout(xaxis = list(zerolinecolor = '#ffff',
zerolinewidth = 2,
gridcolor = 'ffff'),
yaxis = list(zerolinecolor = '#ffff',
zerolinewidth = 2,
gridcolor = 'ffff'),
plot_bgcolor='#e5ecf6', width = 900)
fig
10.9 Plotly Maps
Several map libraries are available in R. In this example, we use the plot_geo()
function from plotly
to plot on a map.
## preparing data
poc <- read_csv("https://raw.githubusercontent.com/pengdsci/sta553.html/main/data/POC.csv")[,c(7,8,9, 17)]
poc.site <- poc[poc$POC == 1,]
# geo styling
geostyle <- list(scope = 'usa',
projection = list(type = 'albers usa'),
showland = TRUE,
landcolor = toRGB("gray95"),
subunitcolor = toRGB("gray85"),
countrycolor = toRGB("gray85"),
countrywidth = 0.5,
subunitwidth = 0.5
)
## plotting map
fig <- plot_geo(poc.site, lat = ~ycoord, lon = ~xcoord) %>%
add_markers(text = ~ SITE_DESCRIPTION,
color = "red",
symbol = I("circle"),
size = I(8),
hoverinfo = "text" ) %>%
layout( title = 'POC Risk Sites', geo = geostyle)
fig