24 LECTURE: Error Handling and Generation

The learning objectives of this section are:

  • Implement exception handling routines in R functions

24.1 What is an error?

Errors most often occur when code is used in a way that it is not intended to be used. For example adding two strings together produces the following error:

"hello" + "world"
## Error in "hello" + "world": non-numeric argument to binary operator

The + operator is essentially a function that takes two numbers as arguments and finds their sum. Since neither "hello" nor "world" are numbers, the R interpreter produces an error. Errors will stop the execution of your program, and they will (hopefully) print an error message to the R console.

In R there are two other constructs which are related to errors: warnings and messages. Warnings are meant to indicate that something seems to have gone wrong in your program that should be inspected. Here’s a simple example of a warning being generated:

as.numeric(c("5", "6", "seven"))
## Warning: NAs introduced by coercion
## [1]  5  6 NA

The as.numeric() function attempts to convert each string in c("5", "6", "seven") into a number, however it is impossible to convert "seven", so a warning is generated. Execution of the code is not halted, and an NA is produced for "seven" instead of a number.

Messages simply print to the R console, though they are generated by an underlying mechanism that is similar to how errors and warning are generated. Here’s a small function that will generate a message:

f <- function(){
  message("This is a message.")
}

f()
## This is a message.

24.2 Generating Errors

There are a few essential functions for generating errors, warnings, and messages in R. The stop() function will generate an error. Let’s generate an error:

stop("Something erroneous has occurred!")
Error: Something erroneous has occurred!

If an error occurs inside of a function then the name of that function will appear in the error message:

name_of_function <- function(){
  stop("Something bad happened.")
}

name_of_function()
## Error in name_of_function(): Something bad happened.

The stopifnot() function takes a series of logical expressions as arguments and if any of them are false an error is generated specifying which expression is false. Let’s take a look at an example:

error_if_n_is_greater_than_zero <- function(n){
  stopifnot(n <= 0)
  n
}

error_if_n_is_greater_than_zero(5)
## Error in error_if_n_is_greater_than_zero(5): n <= 0 is not TRUE

The warning() function creates a warning, and the function itself is very similar to the stop() function. Remember that a warning does not stop the execution of a program (unlike an error.)

warning("Consider yourself warned!")
## Warning: Consider yourself warned!

Just like errors, a warning generated inside of a function will include the name of the function in which it was generated:

make_NA <- function(x){
  warning("Generating an NA.")
  NA
}

make_NA("Sodium")
## Warning in make_NA("Sodium"): Generating an NA.
## [1] NA

Messages are simpler than errors or warnings; they just print strings to the R console. You can issue a message with the message() function:

message("In a bottle.")
## In a bottle.

24.3 When to generate errors or warnings

Stopping the execution of your program with stop() should only happen in the event of a catastrophe - meaning only if it is impossible for your program to continue. If there are conditions that you can anticipate that would cause your program to create an error then you should document those conditions so whoever uses your software is aware. Common failure conditions like providing invalid arguments to a function should be checked at the beginning of your program so that the user can quickly realize something has gone wrong. Checking function inputs is a typical use of the stopifnot() function.

You can think of a function as kind of contract between you and the user: if the user provides specified arguments, your program will provide predictable results. Of course it’s impossible for you to anticipate all of the potential uses of your program, so the results of executing a function can only be predictable with regard to the type of the result. It’s appropriate to create a warning when this contract between you and the user is violated. A perfect example of this situation is the result of as.numeric(c("5", "6", "seven")), which we saw before. The user expects a vector of numbers to be returned as the result of as.numeric() but "seven" is coerced into being NA, which is not completely intuitive.

R has largely been developed according to the Unix Philosophy (which is further discussed in Chapter 3), which generally discourages printing text to the console unless something unexpected has occurred. Languages that commonly run on Unix systems like C, C++, and Go are rarely used interactively, meaning that they usually underpin computer infrastructure (computers “talking” to other computers). Messages printed to the console are therefore not very useful since nobody will ever read them and it’s not straightforward for other programs to capture and interpret them. In contrast R code is frequently executed by human beings in the R console, which serves as an interactive environment between the computer and person at the keyboard. If you think your program should produce a message, make sure that the output of the message is primarily meant for a human to read. You should avoid signaling a condition or the result of your program to another program by creating a message.

24.4 How should errors be handled?

Imagine writing a program that will take a long time to complete because of a complex calculation or because you’re handling a large amount of data. If an error occurs during this computation then you’re liable to lose all of the results that were calculated before the error, or your program may not finish a critical task that a program further down your pipeline is depending on. If you anticipate the possibility of errors occurring during the execution of your program then you can design your program to handle them appropriately.

The tryCatch() function is the workhorse of handling errors and warnings in R. The first argument of this function is any R expression, followed by conditions which specify how to handle an error or a warning. The last argument, finally, specifies a function or expression that will be executed after the expression no matter what, even in the event of an error or a warning.

Let’s construct a simple function I’m going to call beera that catches errors and warnings gracefully.

beera <- function(expr){
  tryCatch(expr,
         error = function(e){
           message("An error occurred:\n", e)
         },
         warning = function(w){
           message("A warning occured:\n", w)
         },
         finally = {
           message("Finally done!")
         })
}

This function takes an expression as an argument and tries to evaluate it. If the expression can be evaluated without any errors or warnings then the result of the expression is returned and the message Finally done! is printed to the R console. If an error or warning is generated then the functions that are provided to the error or warning arguments are printed. Let’s try this function out with a few examples.

beera({
  2 + 2
})
## Finally done!
## [1] 4
beera({
  "two" + 2
})
## An error occurred:
## Error in "two" + 2: non-numeric argument to binary operator
## 
## Finally done!
beera({
  as.numeric(c(1, "two", 3))
})
## A warning occured:
## simpleWarning in doTryCatch(return(expr), name, parentenv, handler): NAs introduced by coercion
## 
## Finally done!

Notice that we’ve effectively transformed errors and warnings into messages.

Now that you know the basics of generating and catching errors you’ll need to decide when your program should generate an error. My advice to you is to limit the number of errors your program generates as much as possible. Even if you design your program so that it’s able to catch and handle errors, the error handling process slows down your program by orders of magnitude. Imagine you wanted to write a simple function that checks if an argument is an even number. You might write the following:

is_even <- function(n){
  n %% 2 == 0
}

is_even(768)
## [1] TRUE
is_even("two")
## Error in n%%2: non-numeric argument to binary operator

You can see that providing a string causes this function to raise an error. You could imagine though that you want to use this function across a list of different data types, and you only want to know which elements of that list are even numbers. You might think to write the following:

is_even_error <- function(n){
  tryCatch(n %% 2 == 0,
           error = function(e){
             FALSE
           })
}

is_even_error(714)
## [1] TRUE
is_even_error("eight")
## [1] FALSE

This appears to be working the way you intended, however when applied to more data this function will be seriously slow compared to alternatives. For example I could check that n is numeric before treating n like a number:

is_even_check <- function(n){
  is.numeric(n) && n %% 2 == 0
}

is_even_check(1876)
## [1] TRUE
is_even_check("twelve")
## [1] FALSE

Notice that by using is.numeric() before the “AND” operator (&&) the expression n %% 2 == 0 is never evaluated. This is a programming language design feature called “short circuiting.” The expression can never evaluate to TRUE if the left hand side of && evaluates to FALSE, so the right hand side is ignored.

To demonstrate the difference in the speed of the code we’ll use the microbenchmark package to measure how long it takes for each function to be applied to the same data.

library(microbenchmark)
microbenchmark(sapply(letters, is_even_check))
Unit: microseconds
                           expr    min      lq     mean  median      uq     max neval
 sapply(letters, is_even_check) 46.224 47.7975 61.43616 48.6445 58.4755 167.091   100
microbenchmark(sapply(letters, is_even_error))
Unit: microseconds
                           expr     min       lq     mean   median       uq      max neval
 sapply(letters, is_even_error) 640.067 678.0285 906.3037 784.4315 1044.501 2308.931   100

The error catching approach is nearly 15 times slower!

Proper error handling is an essential tool for any software developer so that you can design programs that are error tolerant. Creating clear and informative error messages is essential for building quality software. One closing tip I recommend is to put documentation for your software online, including the meaning of the errors that your software can potentially throw. Often a user’s first instinct when encountering an error is to search online for that error message, which should lead them to your documentation!

24.5 Summary

  • Errors, warnings, and messages can be generated within R code using the functions stop, stopifnot, warning, and message.

  • Catching errors, and providing useful error messaging, can improve user experience with functions but can also slow down code substantially.