Topic 10 Interactive Statistics Graphics

This note is all about interactive plots. However, interactive plots cannot be rendered in the PDF and EPUB. Screenshots will be included in the PDF and EPUB versions of this eBook. The HTML version of this eBook will keep the interactivity of all graphics! The code that generated the corresponding plots is still included in all versions of this eBook.

10.1 Plotly

Plotly has a rich and complex set of features. The most common features are:

  • Tooltip “hover” info
  • Zoom in and out of graphs
  • Users can export graphs as an image
  • Integrating multiple graphs
  • Template hover info
  • Animations and moving graphics

One can feed a ggplot to plotly to render ggplot via plotly. Compared to the base R plotting function plot(), plot_ly() is more technical and poorly documented. However, the following factors may make plotly the best option:

  • Graphs presented in a digital/online format
  • Users interact with the graph
  • more customizable than ggplot
  • rendering graphics in a higher resolution

In this note, we introduce the basic statistical graphics using the plotly package. plotly graphics automatically contain interactive elements that allow users to modify, explore, and experience the visualized data in new ways.

The coding effort is similar to that of SAS ODS graphics. To use plot_ly(), we need to install (if not done) and load the plotly package. We use the well-known iris data set in the following plots. A nice plotly cheat sheet can be found at https://github.com/pengdsci/sta553/blob/main/ref/r_plotly_cheat_sheet.pdf

10.2 ScatterPlot

10.2.1 The Default Plot

First, we make a simple interactive scatter plot using sepal length and width. We can view the information about the variables and color coding information in the hover text. The labels of axes and legend titles and labels are default.

plot_ly(
    data = iris,
    x = ~Sepal.Length,  # Horizontal axis 
    y = ~Sepal.Width,   # Vertical axis 
    color = ~factor(Species),  # must be a numeric factor
     type = "scatter",
     mode = "markers")

10.2.2 Addiing Additional Information Through hovertemplate

We can also add additional information to the plot to enhance the interactivity of the plot. For example, we can (1) modify the point size using the value of a numerical variable; (2) add text to the hover text using the text option to show the class label; (3) formulate the hover text using hovertemplate option.

plot_ly(
    data = iris,
    x = ~Sepal.Length,  # Horizontal axis 
    y = ~Sepal.Width,   # Vertical axis 
    color = ~factor(Species),  # must be a numeric factor
     text = ~Species,     # show the species in the hover text
     ## using the following hovertemplate() to add the information of the
     ## two numerical variable to the hover text.
     hovertemplate = paste('<i><b>Sepal Width<b></i>: %{y}',
                           '<br><b>Sepal Length</b>:  %{x}',
                           '<br><b>%{text}</b>'),
     alpha  = 0.9,
     size = ~Sepal.Length,
     type = "scatter",
     mode = "markers")

10.2.3 Enhancing the Plot with Layout() Function

Titles and axis labels are important in any visualization, to include a meaningful title, informative labels, and annotations to the plotly plot, we can use the layout() function. The following code only gives you some design ideas you can use to enhance your plotly charts. The detailed list of configurations can be found on plotly’s reference page at https://plotly.com/r/reference/layout/

plot_ly(
    data = iris,
    x = ~Sepal.Length,  # Horizontal axis 
    y = ~Sepal.Width,   # Vertical axis 
    color = ~factor(Species),  # must be a numeric factor
     text = ~Species,     # show the species in the hover text
     ## using the following hovertemplate() to add the information of the
     ## two numerical variable to the hover text.
     hovertemplate = paste('<i><b>Sepal Width<b></i>: %{y}',
                           '<br><b>Sepal Length</b>:  %{x}',
                           '<br><b>%{text}</b>'),
     alpha  = 0.9,
     size = ~Sepal.Length,
     type = "scatter",
     mode = "markers"
   ) %>%
    layout(  
      ## graphic size
      with = 700,
      height = 700,
      ### Title 
      title =list(text = "Sepal Length vs Sepal Width", 
                          font = list(family = "Times New Roman",  # HTML font family  
                                        size = 18,
                                       color = "red")), 
      ### legend
      legend = list(title = list(text = 'species',
                                 font = list(family = "Courier New",
                                               size = 14,
                                              color = "green")),
                    bgcolor = "ivory",
                    bordercolor = "navy",
                    groupclick = "togglegroup",  # one of  "toggleitem" AND "togglegroup".
                    orientation = "v"  # Sets the orientation of the legend.
                    
                    ),
      ## margin of the plot
      margin = list(
              b = 120,
              l = 50,
              t = 120,
              r = 50
      ),
      ## Background
      plot_bgcolor ='#f7f7f7', 
      ## Axes labels
             xaxis = list( 
                    title=list(text = 'Sepal Length',
                               font = list(family = 'Arial')),
                    zerolinecolor = 'red', 
                    zerolinewidth = 2, 
                    gridcolor = 'white'), 
            yaxis = list( 
                    title=list(text = 'Sepal Width',
                               font = list(family = 'Arial')),
                    zerolinecolor = 'purple', 
                    zerolinewidth = 2, 
                    gridcolor = 'white'),
       ## annotations
       annotations = list(  
                     x = 0.7,   # between 0 and 1. 0 = left, 1 = right
                     y = 0.9,   # between 0 and 1, 0 = bottom, 1 = top
                  font = list(size = 12,
                              color = "darkred"),   
                  text = "The point size is proportional to the sepal length",   
                  xref = "paper",  # "container" spans the entire `width` of the plot. 
                                   # "paper" refers to the width of the plotting area only.  
                  yref = "paper",  #  same as xref
               xanchor = "center", #  horizontal alignment with respect to its x position
               yanchor = "bottom", #  similar to xanchor  
             showarrow = FALSE  
           )
    )
myPlotlyLayout <- function(){
  layout(  
      ## graphic size
      with = 700,
      height = 700,
      ### Title 
      title =list(text = "Sepal Length vs Sepal Width", 
                          font = list(family = "Times New Roman",  # HTML font family  
                                        size = 18,
                                       color = "red")), 
      ### legend
      legend = list(title = list(text = 'species',
                                 font = list(family = "Courier New",
                                               size = 14,
                                              color = "green")),
                    bgcolor = "ivory",
                    bordercolor = "navy",
                    groupclick = "togglegroup",  # one of  "toggleitem" AND "togglegroup".
                    orientation = "v"  # Sets the orientation of the legend.
                    
                    ),
      ## margin of the plot
      margin = list(
              b = 120,
              l = 50,
              t = 120,
              r = 50
      ),
      ## Background
      plot_bgcolor ='#f7f7f7', 
      ## Axes labels
             xaxis = list( 
                    title=list(text = 'Sepal Length',
                               font = list(family = 'Arial')),
                    zerolinecolor = 'red', 
                    zerolinewidth = 2, 
                    gridcolor = 'white'), 
            yaxis = list( 
                    title=list(text = 'Sepal Width',
                               font = list(family = 'Arial')),
                    zerolinecolor = 'purple', 
                    zerolinewidth = 2, 
                    gridcolor = 'white'),
       ## annotations
       annotations = list(  
                     x = 0.7,   # between 0 and 1. 0 = left, 1 = right
                     y = 0.9,   # between 0 and 1, 0 = bottom, 1 = top
                  font = list(size = 12,
                              color = "darkred"),   
                  text = "The point size is proportional to the sepal length",   
                  xref = "paper",  # "container" spans the entire `width` of the plot. 
                                   # "paper" refers to the width of the plotting area only.  
                  yref = "paper",  #  same as xref
               xanchor = "center", #  horizontal alignment with respect to its x position
               yanchor = "bottom", #  similar to xanchor  
             showarrow = FALSE  
           )
  )
       }
plot_ly(
    data = iris,
    x = ~Sepal.Length,  # Horizontal axis 
    y = ~Sepal.Width,   # Vertical axis 
    color = ~factor(Species),  # must be a numeric factor
     text = ~Species,     # show the species in the hover text
     ## using the following hovertemplate() to add the information of the
     ## two numerical variable to the hover text.
     hovertemplate = paste('<i><b>Sepal Width<b></i>: %{y}',
                           '<br><b>Sepal Length</b>:  %{x}',
                           '<br><b>%{text}</b>'),
     alpha  = 0.9,
     size = ~Sepal.Length,
     type = "scatter",
     mode = "markers"
   ) 

10.2.4 Rendering A GGPLOT with ggplotly

We can also render a ggplot in using ggplotly to bring interactivity to the plot.

myplot.theme_new <- function() {
  theme(
    #ggplot margins
     plot.margin = margin(t = 50,  # Top margin
                          r = 30,  # Right margin
                          b = 30,  # Bottom margin
                          l = 30), # Left margin
    ## ggplot titles
    plot.title = element_text(face = "bold", 
                              size = 12,
                              family = "sans", 
                              color = "navy",
                              hjust = 0.5,
                              margin=margin(0,0,30,0)), # left(0),right(1)
    # add border 1)
    panel.border = element_rect(colour = NA, 
                                fill = NA, 
                                linetype = 2),
    # color background 2)
    panel.background = element_rect(fill = "#f6f6f6"),
    # modify grid 3)
    panel.grid.major.x = element_line(colour = 'white', 
                                      linetype = 3, 
                                      size = 0.5),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.y =  element_line(colour = 'white', 
                                       linetype = 3, 
                                       size = 0.5),
    panel.grid.minor.y = element_blank(),
    # modify text, axis and colour 4) and 5)
    axis.text = element_text(colour = "navy", 
                             #face = "italic", 
                             size = 7,
                             #family = "Times New Roman"
                             ),
    axis.title = element_text(colour = "navy", 
                              size = 7,
                              #family = "Times New Roman"
                              ),
    axis.ticks = element_line(colour = "navy"),
    # legend at the bottom 6)
    legend.position = "bottom",
    legend.key.size = unit(0.6, 'cm'), #change legend key size
    legend.key.height = unit(0.6, 'cm'), #change legend key height
    legend.key.width = unit(0.6, 'cm'), #change legend key width
    #legend.title = element_text(size=8), #change legend title font size
    legend.title=element_blank(),  # remove all legend titles
    legend.key = element_rect(fill = "white"),
    #####
    legend.text = element_text(size=8)) #change legend text font size
}
# Change histogram plot line colors by groups
p <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width,
                              color = factor(Species)), linetype = Species) +
             geom_point(size = 2, alpha = 0.7) +
             stat_smooth(method = lm, se=FALSE, size = 0.3) +
             scale_color_manual(values=c("dodgerblue4", "darkolivegreen4", "darkorchid3")) +
             labs(
                 x = "Sepal Length",
                 y = "Sepal Width",
                 title = "Association between Sepal Length and Width") +
             myplot.theme_new() + 
              annotate(geom="text" , 
                       x=6.8, 
                       y=2,
                       label=paste("The Pearson correlation coefficient r = ",                          
                                   round(cor(iris$Sepal.Length, iris$Sepal.Width),3)), 
                          size = 2,
                          color = "navy") + 
               coord_fixed(1)    ## This changes the aspect ratio of the graph
ggplotly(p)
# Change histogram plot line colors by groups
p <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width,
                      color = factor(Species)), linetype = Species) +
             # to add more information about the variables in the data set
             # use labels to denote the variable names inside the function aes()
             aes(label=Species, label2=Petal.Length, label3=Petal.Width) +
             geom_point(size = 2, alpha = 0.7) +
             stat_smooth(method = lm, se=FALSE, size = 0.3) +
             scale_color_manual(values=c("dodgerblue4", "darkolivegreen4", "darkorchid3")) +
             labs(
                 x = "Sepal Length",
                 y = "Sepal Width",
                 title = "Association between Sepal Length and Width") +
             myplot.theme_new() + 
              annotate(geom="text" , 
                       x=6.8, 
                       y=2,
                       label=paste("The Pearson correlation coefficient r = ",                          
                                   round(cor(iris$Sepal.Length, iris$Sepal.Width),3)), 
                          size = 2,
                          color = "navy") + 
               coord_fixed(1)    ## This changes the aspect ratio of the graph
ggplotly(p)

10.3 Barplot

We will create a summarized data set to make bar plots. We define a data set to store the mean of sepal length and sepal width by species using the dyplr and tidyr approaches.

barplotdata <- iris %>%
  group_by(Species) %>%
  summarize(sepal.l.avg = mean(Sepal.Length),
            sepal.w.avg = mean(Sepal.Width),
            petal.l.avg = mean(Petal.Length),
            petal.w.avg = mean(Petal.Width))
kable(head(barplotdata))
Species sepal.l.avg sepal.w.avg petal.l.avg petal.w.avg
setosa 5.006 3.428 1.462 0.246
versicolor 5.936 2.770 4.260 1.326
virginica 6.588 2.974 5.552 2.026

Next, we draw a group bar chart.

plot_ly(
  data = barplotdata,
   x = ~Species,
   y = ~sepal.l.avg,
   type = "bar",
   name = "sepal.len.avg" ) %>%
    add_trace(y=~sepal.w.avg, name = "sepal.wid.avg") %>%
    add_trace(y=~petal.l.avg, name = "petal.len.avg") %>%
    add_trace(y=~petal.w.avg, name = "petal.wid.avg") %>%
    layout( yaxis = list(title ="Mean"),
            title = "Frequency distribution of Iris attributes")

10.4 Histogram

plot_ly(
  data = iris,
   x = ~ Sepal.Length,
   type = "histogram",
   nbinsx = 10, 
   name = "sepal.length",
   alpha = .5,
   marker = list(line = list(color = "darkgray",  width = 2)) ) %>%
   ## adding additional histograms and stack them
    add_histogram(x = ~Sepal.Width,
                name = "sepal.width", nbinsx = 10, alpha = 0.5,
                marker = list(line = list(color = "darkgray",  width = 2))) %>%
    add_histogram(x = ~Petal.Length,
                name = "petal.length",nbinsx = 10, alpha = 0.5,
                marker = list(line = list(color = "darkgray",  width = 2))) %>%
    add_histogram(x = ~Petal.Width,
                name = "petal.lwidth",nbinsx = 10, alpha = 0.5,
                marker = list(line = list(color = "darkgray",  width = 2))) %>%
  layout(barmode = "overlay",
         title = "Histogram of Iris Attribute",
         xaxis = list(title = "Iris Attributes",
                      zeroline = TRUE),
         yaxis = list(title = "Count",
                      zeroline =TRUE))

10.5 Boxplot

Drawing a boxplot is straightforward in plotly.

plot_ly(
  data = iris,
  y = ~ Sepal.Length,
  x = ~Species,
  type = "box",
  color = ~Species,
  boxpoints = "all",
  boxmean = TRUE,
  showlegend = FALSE ) %>%
   layout(title = "Histogram of Iris Attribute",
         xaxis = list(title = "Species",
                      zeroline = TRUE),
         yaxis = list(title = "Sepal Length",
                      zeroline =TRUE))

10.6 Pie Chart

We first define a subset from the iris data by filtering out observations with a sepal length of less than 5. The pie chart will be created to see the distribution of species in the subset of the iris data. Keep in mind that the pie chart is constructed based on a frequency table.

# define a working data set
subiris <- iris[iris$Sepal.Length > 5,5]
## create a frequency table in the form of data frame.
piedata = data.frame(cate =as.vector(unique(subiris)), 
                     freq = as.vector(table(subiris)))
# define a color vector
colors <- c('rgb(211,94,96)', 'rgb(128,133,133)', 'rgb(144,103,167)')
# make a pie chart
plot_ly(piedata, labels = ~cate, values = ~freq, type = 'pie',
        textposition = 'inside',
        textinfo = 'label + percent',
        insidetextfont = list(color = '#FFFFFF'),
        hoverinfo = 'text',
        marker = list(colors = colors,
                      line = list(color = '#FFFFFF', width = 1)),
                      #The 'pull' attribute can also be used to create space between the sectors
        showlegend = FALSE) %>% 
         layout(title = 'Distribution of Species',
                xaxis = list(showgrid = FALSE, zeroline = FALSE, 
                             showticklabels = FALSE),
                yaxis = list(showgrid = FALSE, zeroline = FALSE, 
                             showticklabels = FALSE))

10.7 Density Curve

Assume that we want to compare the distribution of the sepal length of the tree iris flowers. One way to do this comparison is to plot the three estimated density curves.

# define three densities
sepal.len.setosa <- iris[which(iris$Species == "setosa"),]
setosa <- density(sepal.len.setosa$Sepal.Length)
sepal.len.versicolor <- iris[which(iris$Species == "versicolor"),]
versicolor <- density(sepal.len.versicolor$Sepal.Length)
sepal.len.virginica <- iris[which(iris$Species == "virginica"),]
virginica <- density(sepal.len.virginica$Sepal.Length)
# plot density curves
fig <- plot_ly(x = ~virginica$x, y = ~virginica$y, 
               type = 'scatter', mode = 'lines', 
               name = 'virginica', 
               fill = 'tozeroy')  %>% 
           # adding more density curves
       add_trace(x = ~versicolor$x, y = ~versicolor$y, 
                 name = 'versicolor', fill = 'tozeroy')  %>% 
       add_trace(x = ~setosa$x, y = ~setosa$y, 
                 name = 'setosa', fill = 'tozeroy')  %>%   
       layout(xaxis = list(title = 'Sepal Length'),
              yaxis = list(title = 'Density'))
fig

10.8 Serial Plot

stock <- read.csv('https://raw.githubusercontent.com/pengdsci/sta553.html/main/data/finance-charts-apple.csv')
##
fig <- plot_ly(stock, type = 'scatter', mode = 'lines')    %>%
       add_trace(x = ~Date, y = ~AAPL.High)    %>%
       layout(showlegend = F, 
              title='Time Series with Rangeslider',
              xaxis = list(rangeslider = list(visible = T)))  %>%
       layout(xaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff'),
              yaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff'),
              plot_bgcolor='#e5ecf6', width = 900)
fig

10.9 Plotly Maps

Several map libraries are available in R. In this example, we use the plot_geo() function from plotly to plot on a map.

## preparing data
poc <- read_csv("https://raw.githubusercontent.com/pengdsci/sta553.html/main/data/POC.csv")[,c(7,8,9, 17)]
poc.site <- poc[poc$POC == 1,]
# geo styling
geostyle <- list(scope = 'usa',
                 projection = list(type = 'albers usa'),
                 showland = TRUE,
                 landcolor = toRGB("gray95"),
                 subunitcolor = toRGB("gray85"),
                 countrycolor = toRGB("gray85"),
                 countrywidth = 0.5,
                 subunitwidth = 0.5
               )
## plotting map
fig <- plot_geo(poc.site, lat = ~ycoord, lon = ~xcoord) %>%
       add_markers(text = ~ SITE_DESCRIPTION, 
                   color = "red", 
                   symbol = I("circle"), 
                   size = I(8), 
                   hoverinfo = "text" )   %>%
        layout( title = 'POC Risk Sites', geo = geostyle)
fig