24 LECTURE: Error Handling and Generation
The learning objectives of this section are:
- Implement exception handling routines in R functions
24.1 What is an error?
Errors most often occur when code is used in a way that it is not intended to be used. For example adding two strings together produces the following error:
## Error in "hello" + "world": non-numeric argument to binary operator
The +
operator is essentially a function that takes two numbers as arguments
and finds their sum. Since neither "hello"
nor "world"
are numbers, the
R interpreter produces an error. Errors will stop the execution of your program,
and they will (hopefully) print an error message to the R console.
In R there are two other constructs which are related to errors: warnings and messages. Warnings are meant to indicate that something seems to have gone wrong in your program that should be inspected. Here’s a simple example of a warning being generated:
## Warning: NAs introduced by coercion
## [1] 5 6 NA
The as.numeric()
function attempts to convert each string in
c("5", "6", "seven")
into a number, however it is impossible to convert
"seven"
, so a warning is generated. Execution of the code is not halted,
and an NA
is produced for "seven"
instead of a number.
Messages simply print to the R console, though they are generated by an underlying mechanism that is similar to how errors and warning are generated. Here’s a small function that will generate a message:
## This is a message.
24.2 Generating Errors
There are a few essential functions for generating errors, warnings, and
messages in R. The stop()
function will generate an error. Let’s generate
an error:
If an error occurs inside of a function then the name of that function will appear in the error message:
## Error in name_of_function(): Something bad happened.
The stopifnot()
function takes a series of logical expressions as arguments
and if any of them are false an error is generated specifying which expression
is false. Let’s take a look at an example:
error_if_n_is_greater_than_zero <- function(n){
stopifnot(n <= 0)
n
}
error_if_n_is_greater_than_zero(5)
## Error in error_if_n_is_greater_than_zero(5): n <= 0 is not TRUE
The warning()
function creates a warning, and the function itself is very
similar to the stop()
function. Remember that a warning does not stop the
execution of a program (unlike an error.)
## Warning: Consider yourself warned!
Just like errors, a warning generated inside of a function will include the name of the function in which it was generated:
## Warning in make_NA("Sodium"): Generating an NA.
## [1] NA
Messages are simpler than errors or warnings; they just print strings to the
R console. You can issue a message with the message()
function:
## In a bottle.
24.3 When to generate errors or warnings
Stopping the execution of your program with stop()
should only happen in the
event of a catastrophe - meaning only if it is impossible for your program to
continue. If there are conditions that you can anticipate that would cause
your program to create an error then you should document those conditions so
whoever uses your software is aware. Common failure conditions like providing
invalid arguments to a function should be checked at the beginning of your
program so that the user can quickly realize something has gone wrong. Checking function inputs is a typical use of the stopifnot()
function.
You can think of a function as kind of contract between you and the user: if
the user provides specified arguments, your program will provide predictable
results. Of course it’s impossible for you to anticipate all of the potential
uses of your program, so the results of executing a function can only be
predictable with regard to the type of the result. It’s appropriate to create
a warning when this contract between you and the user is violated. A perfect
example of this situation is the result of as.numeric(c("5", "6", "seven"))
,
which we saw before. The user expects a vector of numbers to be returned as the
result of as.numeric()
but "seven"
is coerced into being NA, which is not
completely intuitive.
R has largely been developed according to the Unix Philosophy (which is further discussed in Chapter 3), which generally discourages printing text to the console unless something unexpected has occurred. Languages that commonly run on Unix systems like C, C++, and Go are rarely used interactively, meaning that they usually underpin computer infrastructure (computers “talking” to other computers). Messages printed to the console are therefore not very useful since nobody will ever read them and it’s not straightforward for other programs to capture and interpret them. In contrast R code is frequently executed by human beings in the R console, which serves as an interactive environment between the computer and person at the keyboard. If you think your program should produce a message, make sure that the output of the message is primarily meant for a human to read. You should avoid signaling a condition or the result of your program to another program by creating a message.
24.4 How should errors be handled?
Imagine writing a program that will take a long time to complete because of a complex calculation or because you’re handling a large amount of data. If an error occurs during this computation then you’re liable to lose all of the results that were calculated before the error, or your program may not finish a critical task that a program further down your pipeline is depending on. If you anticipate the possibility of errors occurring during the execution of your program then you can design your program to handle them appropriately.
The tryCatch()
function is the workhorse of handling errors and warnings in R.
The first argument of this function is any R expression, followed by conditions
which specify how to handle an error or a warning. The last argument, finally
,
specifies a function or expression that will be executed after the expression
no matter what, even in the event of an error or a warning.
Let’s construct a simple function I’m going to call
beera
that catches errors and
warnings gracefully.
beera <- function(expr){
tryCatch(expr,
error = function(e){
message("An error occurred:\n", e)
},
warning = function(w){
message("A warning occured:\n", w)
},
finally = {
message("Finally done!")
})
}
This function takes an expression as an argument and tries to evaluate it. If
the expression can be evaluated without any errors or warnings then the result
of the expression is returned and the message Finally done!
is printed to the
R console. If an error or warning is generated then the functions that are
provided to the error
or warning
arguments are printed. Let’s try this
function out with a few examples.
## Finally done!
## [1] 4
## An error occurred:
## Error in "two" + 2: non-numeric argument to binary operator
##
## Finally done!
## A warning occured:
## simpleWarning in doTryCatch(return(expr), name, parentenv, handler): NAs introduced by coercion
##
## Finally done!
Notice that we’ve effectively transformed errors and warnings into messages.
Now that you know the basics of generating and catching errors you’ll need to decide when your program should generate an error. My advice to you is to limit the number of errors your program generates as much as possible. Even if you design your program so that it’s able to catch and handle errors, the error handling process slows down your program by orders of magnitude. Imagine you wanted to write a simple function that checks if an argument is an even number. You might write the following:
## [1] TRUE
## Error in n%%2: non-numeric argument to binary operator
You can see that providing a string causes this function to raise an error. You could imagine though that you want to use this function across a list of different data types, and you only want to know which elements of that list are even numbers. You might think to write the following:
is_even_error <- function(n){
tryCatch(n %% 2 == 0,
error = function(e){
FALSE
})
}
is_even_error(714)
## [1] TRUE
## [1] FALSE
This appears to be working the way you intended, however when applied to more
data this function will be seriously slow compared to alternatives. For example
I could check that n
is numeric before treating n
like a number:
## [1] TRUE
## [1] FALSE
Notice that by using is.numeric()
before the “AND” operator (&&
) the expression n %% 2 == 0
is never evaluated. This is a programming language design feature called “short circuiting.” The expression can never evaluate to TRUE
if the left hand side of &&
evaluates to FALSE
, so the right hand side is ignored.
To demonstrate the difference in the speed of the code we’ll use the
microbenchmark
package to measure how long it takes for each function to be
applied to the same data.
Unit: microseconds
expr min lq mean median uq max neval
sapply(letters, is_even_check) 46.224 47.7975 61.43616 48.6445 58.4755 167.091 100
Unit: microseconds
expr min lq mean median uq max neval
sapply(letters, is_even_error) 640.067 678.0285 906.3037 784.4315 1044.501 2308.931 100
The error catching approach is nearly 15 times slower!
Proper error handling is an essential tool for any software developer so that you can design programs that are error tolerant. Creating clear and informative error messages is essential for building quality software. One closing tip I recommend is to put documentation for your software online, including the meaning of the errors that your software can potentially throw. Often a user’s first instinct when encountering an error is to search online for that error message, which should lead them to your documentation!
24.5 Summary
Errors, warnings, and messages can be generated within R code using the functions
stop
,stopifnot
,warning
, andmessage
.Catching errors, and providing useful error messaging, can improve user experience with functions but can also slow down code substantially.