Topic 4 R Functions and Flow Controls

This chapter reviews the fundamentals of R programming: Functions and flow controls.

4.1 Control flow

There are two primary tools of control flow: choices and loops.

Choices, like if statements and switch() calls, allow us to run different code depending on the input.

Loops, like for and while, allow us to repeatedly run code, typically with changing options.

What is the difference between if and ifelse()?

4.1.1 Choices

The basic form of an if statement in R is as follows:

if (condition) true_action if (condition) true_action else false_action If the condition is TRUE, true_action is evaluated; if the condition is FALSE, the optional false_action is evaluated.

Typically the actions are compound statements contained within {:

grade <- function(x) {
  if (x > 90) {
    "A"
  } else if (x > 80) {
    "B"
  } else if (x > 50) {
    "C"
  } else {
    "F"
  }
}

if returns a value so that you can assign the results:

x1 <- if (TRUE) 1 else 2
x2 <- if (FALSE) 1 else 2
c(x1, x2)

## [1] 1 2

When we use the single argument form without an else statement, it invisibly returns NULL if the condition is FALSE. Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms:

greet <- function(name, birthday = FALSE) {
  paste0(
    "Hi ", name,
    if (birthday) " and HAPPY BIRTHDAY"
  )
}
greet("Maria", FALSE)

## [1] "Hi Maria"

greet("Jaime", TRUE)

## [1] "Hi Jaime and HAPPY BIRTHDAY"

Invalid inputs

The condition should evaluate to a single TRUE or FALSE. Most other inputs will generate an error:

#if ("x") 1
#> Error in if ("x") 1: argument is not interpretable as logical
#if (NA) 1
#> Error in if (NA) 1: missing value where TRUE/FALSE needed

Vectorized if

Given that it only works with a single TRUE or FALSE, you might wonder what to do if you have a vector of logical values. Handling vectors of values is the job of ifelse(): a vectorized function with test, yes, and no vectors (that will be recycled to the same length):

x <- 1:10
ifelse(x %% 5 == 0, "XXX", as.character(x))

##  [1] "1"   "2"   "3"   "4"   "XXX" "6"   "7"   "8"   "9"   "XXX"

ifelse(x %% 2 == 0, "even", "odd")

##  [1] "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even"

Note that missing values will be propagated into the output.

Another vectorized equivalent is the more general dplyr::case_when(). It uses a special syntax to allow any number of condition-vector pairs:

dplyr::case_when(
  x %% 35 == 0 ~ "fizz buzz",
  x %% 5 == 0 ~ "fizz",
  x %% 7 == 0 ~ "buzz",
  is.na(x) ~ "???",
  TRUE ~ as.character(x)
)

##  [1] "1"    "2"    "3"    "4"    "fizz" "6"    "buzz" "8"    "9"    "fizz"

9 %/% 5     ## quotient

## [1] 1

9%%5     ## remainder

## [1] 4

switch() statement

Closely related to it is the switch()-statement. It’s a compact, special-purpose equivalent that lets you replace code like:

x_option <- function(x) {
  if (x == "a") {
    "option 1"
  } else if (x == "b") {
    "option 2" 
  } else if (x == "c") {
    "option 3"
  } else {
    stop("Invalid `x` value")
  }
}

with the more succinct:

x_option <- function(x) {
  switch(x,
    a = "option 1",
    b = "option 2",
    c = "option 3",
    stop("Invalid `x` value")
  )
}

The last component of a switch() should always throw an error, otherwise, unmatched inputs will invisibly return NULL:

(switch("c", a = 1, b = 2))

## NULL

If multiple inputs have the same output, you can leave the right-hand side of = empty and the input will “fall through” to the next value. This mimics the behavior of C’s switch statement:

legs <- function(x) {
  switch(x,
    cow = ,
    horse = ,
    dog = 4,
    human = ,
    chicken = 2,
    plant = 0,
    stop("Unknown input")
  )
}
legs("cow")

## [1] 4

legs("dog")

## [1] 4

It is also possible to use switch() with a numeric x, but is harder to read, and has undesirable failure modes if x is not a whole number.

4.1.2 Loops

for loops are used to iterate over items in a vector. They have the following basic forms:

for (item in vector) perform_action

For each item in the vector, perform_action is called once; updating the value of the item each time.

for (i in 1:3) {
  print(i)
}

## [1] 1
## [1] 2
## [1] 3

for assigns the item to the current environment, overwriting any existing variable with the same name:

i <- 100
for (i in 1:3) {}
i

## [1] 3

There are two ways to terminate a for loop early:

next exits the current iteration.
break exits the entire for a loop.

for (i in 1:10) {
  if (i < 3) 
    next

  print(i)
  
  if (i >= 5)
    break
}

## [1] 3
## [1] 4
## [1] 5

4.2 Functions

We have already created R functions and know how to use them to reduce duplication in our code. In this note, we’ll learn how to turn that informal, working knowledge into a more rigorous understanding of R functions.

4.2.1 Function fundamentals

To understand functions in R we need to internalize two important ideas:

Functions can be broken down into three components: arguments, body, and environment. Functions are objects, just as vectors are objects.

4.2.1.1 Function components

A function has three parts:

The formals(), the list of arguments that control how you call the function.
The body(), the code inside the function.
The environment(), the data structure that determines how the function finds the values associated with the names.

While the formals and body are specified explicitly when you create a function, the environment is specified implicitly, based on where you defined the function. The function environment always exists, but it is only printed when the function isn’t defined in the global environment.

f02 <- function(x, y) {
  x + y
}

formals(f02)

## $x
## 
## 
## $y

body(f02)

## {
##     x + y
## }

environment(f02)

## <environment: R_GlobalEnv>

4.2.1.2 First-class functions

It’s very important to understand that R functions are objects in their own right, a language property often called “first-class functions”. Unlike in many other languages, there is no special syntax for defining and naming a function: we simply create a function object (with function) and bind it to a name with <-:

f01 <- function(x) {
  sin(1 / x ^ 2)
}

While you almost always create a function and then bind it to a name, the binding step is not compulsory. If you choose not to give a function a name, you get an anonymous function. This is useful when it’s not worth the effort to figure out a name:

lapply(mtcars, function(x) length(unique(x)))

## $mpg
## [1] 25
## 
## $cyl
## [1] 3
## 
## $disp
## [1] 27
## 
## $hp
## [1] 22
## 
## $drat
## [1] 22
## 
## $wt
## [1] 29
## 
## $qsec
## [1] 30
## 
## $vs
## [1] 2
## 
## $am
## [1] 2
## 
## $gear
## [1] 3
## 
## $carb
## [1] 6

Filter(function(x) !is.numeric(x), mtcars)

## data frame with 0 columns and 32 rows

integrate(function(x) sin(x) ^ 2, 0, pi)$value

## [1] 1.570796

names(iris)

## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

iris$Species

##   [1] setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa    
##  [11] setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa    
##  [21] setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa    
##  [31] setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa    
##  [41] setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa    
##  [51] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
##  [61] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
##  [71] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
##  [81] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
##  [91] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
## [101] virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica 
## [111] virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica 
## [121] virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica 
## [131] virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica 
## [141] virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica 
## Levels: setosa versicolor virginica

A final option is to put functions in a list:

funs <- list(
  half = function(x) x / 2,
  double = function(x) x * 2
)

funs$double(10)

## [1] 20

funs$half(10)

## [1] 5

4.2.2 Function Composition

Base R provides two ways to compose multiple function calls. For example, imagine we want to compute the population standard deviation using sqrt() and mean() as building blocks:

square <- function(x) x^2
square(7)

## [1] 49

sqrt0 = function(x){
  x^2
}
##
sqrt0(7)

## [1] 49

deviation <- function(x) x - mean(x)

You either nest the function calls:

x <- runif(100)
sqrt(mean(square(deviation(x))))

## [1] 0.2889636

Or we save the intermediate results as variables:

out <- deviation(x)
out <- square(out)
out <- mean(out)
out <- sqrt(out)
out

## [1] 0.2889636

The magrittr package provides a third option: the binary operator %>%, which is called the pipe and is pronounced as “and then”.

library(magrittr)

## 
## 载入程辑包：'magrittr'

## The following object is masked from 'package:purrr':
## 
##     set_names

## The following object is masked from 'package:tidyr':
## 
##     extract

square <- function(x) x^2
deviation <- function(x) x - mean(x)
x <- 1:50
##
x %>%
  deviation() %>%  # deviation () is defined function above
  square() %>%
  mean() %>%
  sqrt()

## [1] 14.43087

pipe operator %>%:x %>% f() is equivalent to f(x); x %>% f(y) is equivalent to f(x, y).

4.2.2.1 Infix functions

Infix functions get their name from the fact the function name comes in between its arguments and hence has two arguments. R comes with a number of built-in infix operators:

: - an operator that generates a patterned sequence. It is also used to indicate an interaction of two variables.
:: - an operator to access an object in a known package. For example, stats::sd.
::: - an operator to access an object in a package - it is rarely used.,
$ - extracts elements by name from a named list.
^ - exponential operator (to-the-power-of)
* - multiplication (operator)
/ - division (operator)
+ - addition (operator)
- - subtraction (operator)
> - logical and numerical GREATER THAN
>= - logical and numerical GREATER OR EQUAL TO
< - logical and numerical EQUAL TO
<= - logical and numerical LESS THAN OR EQUAL TO
== - logical EQUAL TO
!= - logical and numerical NOT EQUAL TO
! - logical negation (NOT)
& - logical AND (element-wise).
&& - logical AND.
| - logical OR (element-wise).
|| - logical OR.
~ - operator used in the formation of a model
<- - leftwards assignment
<<-- leftwards assignment (used for assigning to variables in the parent environments)
-> - rightwards assignment
->> - rightwards assignment (used for assigning to variables in the parent environments)
%% - modulus (Remainder from division)
%/% - integer Division

We can also create your own infix functions that start and end with %. Base R uses this pattern to define %%, %*%, %/%, %in%, %o%, and %x%.

Defining our own infix function is simple. We create a two-argument function and bind it to a name that starts and ends with %. For example,

`%+%` <- function(a, b) paste0(a, b)
"new " %+% "string"

## [1] "new string"

The names of infix functions are more flexible than regular R functions: they can contain any sequence of characters except for %. You will need to escape any special characters in the string used to define the function, but not when you call it:

`% %` <- function(a, b) paste(a, b)
`%/\\%` <- function(a, b) paste(a, b)

"a" % % "b"

## [1] "a b"

"a" %/\% "b"

## [1] "a b"

R’s default precedence rules mean that infix operators are composed left to right.

`%-%` <- function(a, b) paste0("(", a, " %-% ", b, ")")
"a" %-% "b" %-% "c"

## [1] "((a %-% b) %-% c)"