There are two primary tools of control flow: choices and loops.
Choices, like if statements and
switch()
calls, allow us to run different code depending on
the input.
Loops, like for and while, allow us to repeatedly run code, typically with changing options.
What is the difference between if and ifelse()?
The basic form of an if statement in R is as follows:
if (condition) true_action if (condition) true_action else false_action If the condition is TRUE, true_action is evaluated; if the condition is FALSE, the optional false_action is evaluated.
Typically the actions are compound statements contained within {:
grade <- function(x) {
if (x > 90) {
"A"
} else if (x > 80) {
"B"
} else if (x > 50) {
"C"
} else {
"F"
}
}
if
returns a value so that you can assign the
results:
x1 <- if (TRUE) 1 else 2
x2 <- if (FALSE) 1 else 2
c(x1, x2)
[1] 1 2
When we use the single argument form without an else statement, if
invisibly returns NULL if the condition is FALSE. Since functions like
c()
and paste()
drop NULL inputs, this allows
for a compact expression of certain idioms:
greet <- function(name, birthday = FALSE) {
paste0(
"Hi ", name,
if (birthday) " and HAPPY BIRTHDAY"
)
}
greet("Maria", FALSE)
[1] "Hi Maria"
greet("Jaime", TRUE)
[1] "Hi Jaime and HAPPY BIRTHDAY"
Invalid inputs
The condition should evaluate to a single TRUE or FALSE. Most other inputs will generate an error:
# if ("x") 1
# > Error in if ("x") 1: argument is not interpretable as logical
# if (NA) 1
# > Error in if (NA) 1: missing value where TRUE/FALSE needed
Vectorized if
Given that if only works with a single TRUE or FALSE, you might
wonder what to do if you have a vector of logical values. Handling
vectors of values is the job of ifelse()
: a vectorized
function with test, yes, and no vectors (that will be recycled to the
same length):
x <- 1:10
ifelse(x %% 5 == 0, "XXX", as.character(x))
[1] "1" "2" "3" "4" "XXX" "6" "7" "8" "9" "XXX"
ifelse(x %% 2 == 0, "even", "odd")
[1] "odd" "even" "odd" "even" "odd" "even" "odd" "even" "odd" "even"
Note that missing values will be propagated into the output.
Another vectorized equivalent is the more general dplyr::case_when(). It uses a special syntax to allow any number of condition-vector pairs:
dplyr::case_when(
x %% 35 == 0 ~ "fizz buzz",
x %% 5 == 0 ~ "fizz",
x %% 7 == 0 ~ "buzz",
is.na(x) ~ "???",
TRUE ~ as.character(x)
)
[1] "1" "2" "3" "4" "fizz" "6" "buzz" "8" "9" "fizz"
9 %/% 5 ## quotient
[1] 1
9%%5 ## remainder
[1] 4
switch() statement
Closely related to if is the switch()-statement. It’s a compact, special purpose equivalent that lets you replace code like:
x_option <- function(x) {
if (x == "a") {
"option 1"
} else if (x == "b") {
"option 2"
} else if (x == "c") {
"option 3"
} else {
stop("Invalid `x` value")
}
}
with the more succinct:
x_option <- function(x) {
switch(x,
a = "option 1",
b = "option 2",
c = "option 3",
stop("Invalid `x` value")
)
}
The last component of a switch()
should always throw an
error, otherwise unmatched inputs will invisibly return NULL:
(switch("c", a = 1, b = 2))
NULL
If multiple inputs have the same output, you can leave the right hand side of = empty and the input will “fall through” to the next value. This mimics the behavior of C’s switch statement:
legs <- function(x) {
switch(x,
cow = ,
horse = ,
dog = 4,
human = ,
chicken = 2,
plant = 0,
stop("Unknown input")
)
}
legs("cow")
[1] 4
legs("dog")
[1] 4
It is also possible to use switch()
with a numeric x,
but is harder to read, and has undesirable failure modes if x is not a
whole number.
for loops
are used to iterate over items in a vector.
They have the following basic form:
for (item in vector) perform_action
For each item in vector, perform_action is called once; updating the value of the item each time.
for (i in 1:3) {
print(i)
}
[1] 1
[1] 2
[1] 3
for
assigns the item to the current environment,
overwriting any existing variable with the same name:
i <- 100
for (i in 1:3) {}
i
[1] 3
There are two ways to terminate a for a loop early:
next
exits the current iteration.
break
exits the entire for a loop.
for (i in 1:10) {
if (i < 3)
next
print(i)
if (i >= 5)
break
}
[1] 3
[1] 4
[1] 5
We have already created R functions and know how to use them to reduce duplication in our code. In this note, we’ll learn how to turn that informal, working knowledge into a more rigorous understanding of R functions.
To understand functions in R we need to internalize two important ideas:
Functions can be broken down into three components: arguments, body, and environment. Functions are objects, just as vectors are objects.
A function has three parts:
The formals()
, the list of arguments that control
how you call the function.
The body()
, the code inside the function.
The environment()
, the data structure that
determines how the function finds the values associated with the
names.
While the formals and body are specified explicitly when you create a function, the environment is specified implicitly, based on where you defined the function. The function environment always exists, but it is only printed when the function isn’t defined in the global environment.
f02 <- function(x, y) {
x + y
}
formals(f02)
$x
$y
body(f02)
{
x + y
}
environment(f02)
<environment: R_GlobalEnv>
It’s very important to understand that R functions are objects in their own right, a language property often called “first-class functions”. Unlike in many other languages, there is no special syntax for defining and naming a function: we simply create a function object (with function) and bind it to a name with <-:
f01 <- function(x) {
sin(1 / x ^ 2)
}
While you almost always create a function and then bind it to a name, the binding step is not compulsory. If you choose not to give a function a name, you get an anonymous function. This is useful when it’s not worth the effort to figure out a name:
lapply(mtcars, function(x) length(unique(x)))
$mpg
[1] 25
$cyl
[1] 3
$disp
[1] 27
$hp
[1] 22
$drat
[1] 22
$wt
[1] 29
$qsec
[1] 30
$vs
[1] 2
$am
[1] 2
$gear
[1] 3
$carb
[1] 6
Filter(function(x) !is.numeric(x), mtcars)
data frame with 0 columns and 32 rows
integrate(function(x) sin(x) ^ 2, 0, pi)$value
[1] 1.570796
names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
iris$Species
[1] setosa setosa setosa setosa setosa setosa
[7] setosa setosa setosa setosa setosa setosa
[13] setosa setosa setosa setosa setosa setosa
[19] setosa setosa setosa setosa setosa setosa
[25] setosa setosa setosa setosa setosa setosa
[31] setosa setosa setosa setosa setosa setosa
[37] setosa setosa setosa setosa setosa setosa
[43] setosa setosa setosa setosa setosa setosa
[49] setosa setosa versicolor versicolor versicolor versicolor
[55] versicolor versicolor versicolor versicolor versicolor versicolor
[61] versicolor versicolor versicolor versicolor versicolor versicolor
[67] versicolor versicolor versicolor versicolor versicolor versicolor
[73] versicolor versicolor versicolor versicolor versicolor versicolor
[79] versicolor versicolor versicolor versicolor versicolor versicolor
[85] versicolor versicolor versicolor versicolor versicolor versicolor
[91] versicolor versicolor versicolor versicolor versicolor versicolor
[97] versicolor versicolor versicolor versicolor virginica virginica
[103] virginica virginica virginica virginica virginica virginica
[109] virginica virginica virginica virginica virginica virginica
[115] virginica virginica virginica virginica virginica virginica
[121] virginica virginica virginica virginica virginica virginica
[127] virginica virginica virginica virginica virginica virginica
[133] virginica virginica virginica virginica virginica virginica
[139] virginica virginica virginica virginica virginica virginica
[145] virginica virginica virginica virginica virginica virginica
Levels: setosa versicolor virginica
A final option is to put functions in a list:
funs <- list(
half = function(x) x / 2,
double = function(x) x * 2
)
funs$double(10)
[1] 20
funs$half(10)
[1] 5
Base R provides two ways to compose multiple function calls. For
example, imagine we want to compute the population standard deviation
using sqrt()
and mean()
as building
blocks:
square <- function(x) x^2
square(7)
[1] 49
sqrt0 = function(x){
x^2
}
##
sqrt0(7)
[1] 49
deviation <- function(x) x - mean(x)
You either nest the function calls:
x <- runif(100)
sqrt(mean(square(deviation(x))))
[1] 0.2669384
Or we save the intermediate results as variables:
out <- deviation(x)
out <- square(out)
out <- mean(out)
out <- sqrt(out)
out
[1] 0.2669384
The magrittr
package provides a third option: the binary
operator %>%
, which is called the pipe and is pronounced
as “and then”
.
library(magrittr)
square <- function(x) x^2
deviation <- function(x) x - mean(x)
x <- 1:50
##
x %>%
deviation() %>% # deviation () is defined function above
square() %>%
mean() %>%
sqrt()
[1] 14.43087
pipe operator %>%:x %>% f()
is
equivalent to f(x); x %>% f(y)
is equivalent to f(x,
y).
Infix functions get their name from the fact the function name comes in between its arguments, and hence have two arguments. R comes with a number of built-in infix operators:
:
- an operator that generates a patterned sequence. It
is also used to indicate an interaction
of two
variables.::
- an operator to access an object in a known
package. For example, stats::sd
.:::
- an operator to access an object in a package - it
is rarely used.,$
- extracts elements by name from a named list.^
- exponential operator (to-the-power-of)*
- multiplication (operator)/
- division (operator)+
- addition (operator)-
- subtraction (operator)>
- logical and numerical GREATER THAN>=
- logical and numerical GREATER OR EQUAL TO<
- logical and numerical EQUAL TO<=
- logical and numerical LESS THAN OR EQUAL
TO==
- logical EQUAL TO!=
- logical and numerical NOT EQUAL TO!
- logical negation (NOT)&
- logical AND (element-wise).&&
- logical AND.|
- logical OR (element-wise).||
- logical OR.~
- operator used in the formation of a model<-
- leftwards assignment<<-
- leftwards assignment (used for assigning to
variables in the parent environments)->
- rightwards assignment->>
- rightwards assignment (used for assigning
to variables in the parent environments)%%
- modulus (Remainder from division)%/%
- integer DivisionWe can also create your own infix functions that start and end with %. Base R uses this pattern to define %%, %*%, %/%, %in%, %o%, and %x%.
Defining our own infix function is simple. We create a two-argument
function and bind it to a name that starts and ends with %
.
For example,
`%+%` <- function(a, b) paste0(a, b)
"new " %+% "string"
[1] "new string"
The names of infix functions are more flexible than regular R functions: they can contain any sequence of characters except for %. You will need to escape any special characters in the string used to define the function, but not when you call it:
`% %` <- function(a, b) paste(a, b)
`%/\\%` <- function(a, b) paste(a, b)
"a" % % "b"
[1] "a b"
"a" %/\% "b"
[1] "a b"
R’s default precedence rules mean that infix operators are composed left to right.
`%-%` <- function(a, b) paste0("(", a, " %-% ", b, ")")
"a" %-% "b" %-% "c"
[1] "((a %-% b) %-% c)"