3.4 Documentation

The objectives of this section are:

  • Create R function documentation using roxygen2
  • Create vignettes using knitr and R Markdown

There are two main types of documentation you may want to include with packages:

  • Longer documents that give tutorials or overviews for the whole package
  • Shorter, function-specific help files for each function or group of related functions

You can create the first type of document using package vignettes, README files, or both. For the function-specific help files, the easiest way to create these is with the roxygen2 package.

In this section, we’ll cover why and how to create this documentation. In addition, vignette / README documentation can be done using knitr to create R Markdown documents that mix R code and text, so we’ll include more details on that process.

3.4.1 Vignettes and README files

You will likely want to create a document that walks users through the basics of how to use your package. You can do this through two formats:

  • Vignette: This document is bundled with your R package, so it becomes locally available to a user once they install your package from CRAN. They will also have it available if they install the package from GitHub, as long as they use the build_vignettes = TRUE option when running install_github.
  • README file: If you have your package on GitHub, this document will show up on the main page of the repository.

A package likely only needs a README file if you are posting the package to GitHub. For any GitHub repository, if there is a README.md file in the top directory of the repository, it will be rendered on the main GitHub repository page below the listed repository content. For an example, visit https://github.com/geanders/countytimezones and scroll down. You’ll see a list of all the files and subdirectories included in the package repository and below that is the content in the package’s README.md file, which gives a tutorial on using the package.

If the README file does not need to include R code, you can write it directly as an .md file, using Markdown syntax, which is explained in more detail in the next section. If you want to include R code, you should start with a README.Rmd file, which you can then render to Markdown using knitr. You can use the devtools package to add either a README.md or README.Rmd file to a package directory using use_readme_md or use_readme_rmd, respectively. These functions will add the appropriate file to the top level of the package directory and will also add the file name to “.Rbuildignore,” since having one of these files in the top level of the package directory could otherwise cause some problems when building the package.

The README file is a useful way to give GitHub users information about your package, but it will not be included in builds of the package or be available through CRAN for packages that are posted there. Instead, if you want to create tutorials or overview documents that are included in a package build, you should do that by adding one or more package vignettes. Vignettes are stored in a vignettes subdirectory within the package directory.

To add a vignette file, saved within this subdirectory (which will be created if you do not already have it), use the use_vignette function from devtools. This function takes as arguments the file name of the vignette you’d like to create and the package for which you’d like to create it (the default is the package in the current working directory). For example, if you are currently working in your package’s top-level directory and you would like to add a vignette called “model_details,” you can do that with the code:

use_vignette("model_details")

You can have more than one vignette per package, which can be useful if you want to include one vignette that gives a more general overview of the package as well as a few vignettes that go into greater detail about particular aspects or applications.

T> Once you create a vignette with use_vignette, be sure to update the Vignette Index Entry in the vignette’s YAML (the code at the top of an R Markdown document). Replace “Vignette Title” there with the actual title you use for the vignette.

3.4.2 Knitr / Markdown

Both vignettes and README files can be written as R Markdown files, which will allow you to include R code examples and results from your package. One of the most exciting tools in R is the knitr system for combining code and text to create a reproducible document. In terms of the power you get for time invested in learning a tool, knitr probably can’t be beat. Everything you need to know to create and “knit” a reproducible document can be learned in about 20 minutes, and while there is a lot more you can do to customize this process if you want to, probably 80% of what you’ll ever want to do with knitr you’ll learn in those first 20 minutes.

R Markdown files are mostly written using Markdown. To write R Markdown files, you need to understand what markup languages like Markdown are and how they work. In Word and other word processing programs you have used, you can add formatting using buttons and keyboard shortcuts (e.g., “Ctrl-B” for bold). The file saves the words you type. It also saves the formatting, but you see the final output, rather than the formatting markup, when you edit the file (WYSIWYG – what you see is what you get). In markup languages, on the other hand, you markup the document directly to show what formatting the final version should have (e.g., you type **bold** in the file to end up with a document with bold). Examples of markup languages include:

  • HTML (HyperText Markup Language)
  • LaTex
  • Markdown (a “lightweight” markup language)

3.4.3 Common Markdown formatting elements

To write a file in Markdown, you’ll need to learn the conventions for creating formatting. This table shows what you would need to write in a flat file for some common formatting choices:

Code Rendering Explanation
**text** text boldface
*text* text italicized
[text](www.google.com) text hyperlink
# text first-level header
## text second-level header

Some other simple things you can do in Markdown include:

  • Lists (ordered or bulleted)
  • Equations
  • Tables
  • Figures from files
  • Block quotes
  • Superscripts

The start of a Markdown file gives some metadata for the file (authors, title, format) in a language called YAML. For example, the YAML section of a package vignette might look like this:

---
title: "Model Details for example_package"
author: "Jane Doe"
date: "2020-12-20"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Model Details for example_package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

When creating R Markdown documents using the RStudio toolbar, much of this YAML will be automatically generated based on your specifications when opening the initial file. However, this is not the case with package vignettes, for which you’ll need to go into the YAML and add the authors and title yourself. Leave the vignette engine, vignette encoding, output, and date as their default values.

For more Markdown conventions, see RStudio’s R Markdown Reference Guide (link also available through “Help” in RStudio).

R Markdown files work a lot like Markdown files, but add the ability to include R code that will be run before rendering the final document. This functionality is based on literate programming, an idea developed by Donald Knuth, to mix executable code with regular text. The files you create can then be “knitted,” to run any embedded code. The final output will have results from your code and the regular text.

The basic steps of opening and rendering an R Markdown file in RStudio are:

  • To open a new R Markdown file, go to “File” -> “New File” -> “RMarkdown….” To start, choose a “Document” in “HTML” format.
  • This will open a new R Markdown file in RStudio. The file extension for R Markdown files is “.Rmd.”
  • The new file comes with some example code and text. You can run the file as-is to try out the example. You will ultimately delete this example code and text and replace it with your own.
  • Once you “knit” the R Markdown file, R will render an HTML file with the output. This is automatically saved in the same directory where you saved your .Rmd file.
  • Write everything besides R code using Markdown syntax.

The knit function from the knitr package works by taking a document in R Markdown format (among a few possible formats), reading through it for any markers of the start of R code, running any of the code between that “start” marker and a marker showing a return to regular Markdown, writing any of the relevant results from R code into the Markdown file in Markdown format, and then passing the entire document to software that can render from Markdown to the desired output format (for example, compile a pdf, Word, or HTML document).

This means that all a user needs to do to include R code within a document is to properly separate it from other parts of the document through the appropriate markers. To indicate R code in an RMarkdown document, you need to separate off the code chunk using the following syntax:

```{r}
my_vec <- 1:10
```

This syntax tells R how to find the start and end of pieces of R code (code chunks) when the file is rendered. R will walk through, find each piece of R code, run it and create output (printed output or figures, for example), and then pass the file along to another program to complete rendering (e.g., Tex for pdf files).

You can specify a name for each chunk, if you’d like, by including it after “r” when you begin your chunk. For example, to give the name load_mtcars to a code chunk that loads the mtcars dataset, specify that name in the start of the code chunk:

```{r load_mtcars}
data(mtcars)
```

T> Here are a couple of tips for naming code chunks: T> T> - Chunk names must be unique across a document. T> - Any chunks you don’t name are given ordered numbers by knitr.

You do not have to name each chunk. However, there are some advantages:

  • It will be easier to find any errors.
  • You can use the chunk labels in referencing for figure labels.
  • You can reference chunks later by name.

3.4.4 Common knitr chunk options

You can also add options when you start a chunk. Many of these options can be set as TRUE / FALSE and include:

Option Action
echo Print out the R code?
eval Run the R code?
messages Print out messages?
warnings Print out warnings?
include If FALSE, run code, but don’t print code or results

Other chunk options take values other than TRUE / FALSE. Some you might want to include are:

Option Action
results How to print results (e.g., hide runs the code, but doesn’t print the results)
fig.width Width to print your figure, in inches (e.g., fig.width = 4)
fig.height Height to print your figure

To include any of these options, add the option and value in the opening brackets and separate multiple options with commas:

```{r  messages = FALSE, echo = FALSE}
mtcars[1, 1:3]
```

You can set “global” options at the beginning of the document. This will create new defaults for all of the chunks in the document. For example, if you want echo, warning, and message to be FALSE by default in all code chunks, you can run:

```{r  global_options}
knitr::opts_chunk$set(echo = FALSE, message = FALSE,
  warning = FALSE)
```

If you set both global and local chunk options that you set specifically for a chunk will take precedence over global options. For example, running a document with:

```{r  global_options}
knitr::opts_chunk$set(echo = FALSE, message = FALSE,
  warning = FALSE)
```


```{r  check_mtcars, echo = TRUE}
head(mtcars, 1)
```

would print the code for the check_mtcars chunk, because the option specified for that specific chunk (echo = TRUE) would override the global option (echo = FALSE).

You can also include R output directly in your text (“inline”) using backticks:

“There are `r nrow(mtcars)` observations in the mtcars data set. The average miles per gallon is `r mean(mtcars$mpg, na.rm = TRUE)`.”

Once the file is rendered, this gives:

“There are 32 observations in the mtcars data set. The average miles per gallon is 20.090625.”

T> Here are some tips that will help you diagnose some problems rendering R Markdown files: T> T> - Be sure to save your R Markdown file before you run it. T> - All the code in the file will run “from scratch”— as if you just opened a new R session. T> - The code will run using, as a working directory, the directory where you saved the R Markdown file. T> - To use the latest version of functions in a package you are developing in an R Markdown document, rebuild the package before knitting the document. You can build a package using the “Build” tab in one of the RStudio panes.

You’ll want to try out pieces of your code as you write an R Markdown document. There are a few ways you can do that:

  • You can run code in chunks just like you can run code from a script (Ctrl-Return or the “Run” button).
  • You can run all the code in a chunk (or all the code in all chunks) using the different options under the “Run” button in RStudio.
  • All the “Run” options have keyboard shortcuts, so you can use those.

I> Two excellent books for learning more about creating reproducible documents with R are Dynamic Documents with R and knitr by Yihui Xie (the creator of knitr) and Reproducible Research with R and RStudio by Christopher Gandrud. The first goes into the technical details of how knitr and related code works, which gives you the tools to extensively customize a document. The second provides an extensive view of how to use tools from R and other open source software to conduct, write up, and present research in a reproducible and efficient way. RStudio’s R Markdown Cheatsheet is another very useful reference.

3.4.5 Help files and roxygen2

In addition to writing tutorials that give an overview of your whole package, you should also write specific documentation showing users how to use and interpret any functions you expect users to directly call.

These help files will ultimately go in a folder called /man of your package, in an R documentation format (.Rd file extensions) that is fairly similar to LaTeX. You used to have to write all of these files as separate files. However, the roxygen2 package lets you put all of the help information directly in the code where you define each function. Further, roxygen2 documentation allows you to include tags (@export, @importFrom) that will automate writing the package NAMESPACE file, so you don’t need to edit that file by hand.

With roxygen2, you add the help file information directly above the code where you define each functions, in the R scripts saved in the R subdirectory of the package directory. You start each line of the roxygen2 documentation with #' (the second character is an apostrophe, not a backtick). The first line of the documentation should give a short title for the function, and the next block of documentation should be a longer description. After that, you will use tags that start with @ to define each element you’re including. You should leave an empty line between each section of documentation, and you can use indentation for second and later lines of elements to make the code easier to read.

Here is a basic example of how this roxygen2 documentation would look for a simple “Hello world” function:

#' Print "Hello world" 
#'
#' This is a simple function that, by default, prints "Hello world". You can 
#' customize the text to print (using the \code{to_print} argument) and add
#' an exclamation point (\code{excited = TRUE}).
#'
#' @param to_print A character string giving the text the function will print
#' @param excited Logical value specifying whether to include an exclamation
#'    point after the text
#' 
#' @return This function returns a phrase to print, with or without an 
#'    exclamation point added. As a side effect, this function also prints out
#'    the phrase. 
#'
#' @examples
#' hello_world()
#' hello_world(excited = TRUE)
#' hello_world(to_print = "Hi world")
#'
#' @export
hello_world <- function(to_print = "Hello world", excited = FALSE){
    if(excited) to_print <- paste0(to_print, "!")
    print(to_print)
}

You can run the document function from the devtools package at any time to render the latest version of these roxygen2 comments for each of your functions. This will create function-specific help files in the package’s “man” subdirectory as well as update the package’s NAMESPACE file.

3.4.6 Common roxygen2 tags

Here are some of the common roxygen2 tags to use in creating this documentation:

Tag Meaning
@return A description of the object returned by the function
@parameter Explanation of a function parameter
@inheritParams Name of a function from which to get parameter definitions
@examples Example code showing how to use the function
@details Add more details on how the function works (for example, specifics of the algorithm being used)
@note Add notes on the function or its use
@source Add any details on the source of the code or ideas for the function
@references Add any references relevant to the function
@importFrom Import a function from another package to use in this function (this is especially useful for inline functions like %>% and %within%)
@export Export the function, so users will have direct access to it when they load the package

Here are a few things to keep in mind when writing help files using roxygen2:

  • The tags @example and @examples do different things. You should always use the @examples (plural) tag for example code, or you will get errors when you build the documentation.
  • The @inheritParams function can save you a lot of time, because if you are using the same parameters in multiple functions in your package, you can write and edit those parameter descriptions just in one place. However, keep in mind that you must point @inheritParams to the function where you originally define the parameters using @param, not another function where you use the parameters but define them using an @inheritParams pointer.
  • If you want users to be able to directly use the function, you must include @export in your roxygen2 documentation. If you have written a function but then find it isn’t being found when you try to compile a README file or vignette, a common culprit is that you have forgotten to export the function.

3.4.7 Common roxygen2 formatting tags

You can include formatting (lists, etc.) and equations in the roxygen2 documentation. Here are some of the common formatting tags you might want to use:

Tag Meaning
\code{} Format in a typeface to look like code
\dontrun{} Use with examples, to avoid running the example code during package builds and testing
\link{} Link to another R function
\eqn{}{} Include an inline equation
\deqn{}{} Include a display equation (i.e., shown on its own line)
\itemize{} Create an itemized list
\url{} Include a web link
\href{}{} Include a web link

Some tips on using the R documentation format:

  • Usually, you’ll want you use the \link tag only in combination with the \code tag, since you’re linking to another R function. Make sure you use these with \code wrapping \link, not the other way around (\code{\link{other_function}}), or you’ll get an error.
  • Some of the equation formatting, including superscripts and subscripts, won’t parse in Markdown-based documentation (but will for pdf-based documentation). With the \eqn and deqn tags, you can include two versions of an equation, one with full formatting, which will be fully compiled by pdf-based documentation, and one with a reduced form that looks better in Markdown-based documentation (for example, \deqn{ \frac{X^2}{Y} }{ X2 / Y }).
  • For any examples in help files that take a while to run, you’ll want to wrap the example code in the \dontrun tag.
  • The tags \url and \href both include a web link. The difference between the two is that \url will print out the web address in the help documentation, href allows you to use text other than the web address for the anchor text of the link. For example: "For more information, see \url{www.google.com}."; "For more information, \href{www.google.com}{Google it}.".

In addition to document functions, you should also document any data that comes with your package. To do that, create a file in the /R folder of the package called “data.R” to use to documentation all of the package’s datasets. You can use roxygen2 to document each dataset, and end each with the name of the dataset in quotation marks. There are more details on documenting package data using roxygen2 in the next section.

I> As you prepare a package for sharing with others, you may want to create a pdf manual, which provides a more user-friendly format for proofreading all the package help files. You can create one with the R CMD Rd2pdf shell command. To use this, open a shell and navigate to the parent directory of your R package directory (an easy way to do this is to open a shell using the “Shell” option for the gear button in the Git pane in RStudio and then running cd .. to move up one directory). Then, from the shell, run R CMD Rd2pdf followed by your package’s name (e.g., for a package named “examplepackage,” run R CMD Rd2pdf examplepackage). This command builds your package and creates and opens a pdf with the text of all help files for exported functions. Check out this StackOverflow thread for more.


### Summary

You should include documentation to help others use your package, both longer-form documentation through vignettes or README files and function-specific help files. Longer-form documentation can be written using R Markdown files, which can include executable R code examples, while function-specific help files can be written using `roxygen2` comments within the R scripts where each function is defined.

## Data Within a Package

The objective of this section is:

* Create an R package that contains data (and associated documentation)


Many R packages are designed to manipulate, visualize, and model data so it may
be a good idea for you to include some data in your package. The primary reason
most developers include data in their package is to demonstrate how to use the 
functions included in the package with the included data. Creating a package
as a means to distribute data is also a method that is gaining popularity.
Additionally you may want to include data that your package uses internally,
but is not available to somebody who is using your package. When including
data in your package consider the fact that your compressed package file should
be smaller than 5MB, which is the largest package size that CRAN allows. If
your package is larger than 5MB make sure to inform users in the instructions
for downloading and installing your package.

### Data for Demos

#### Data Objects

Including data in your package is easy thanks to the `devtools` package. To
include datasets in a package, first create the objects that you would like to
include in your package inside of the global environment. You can include any
R object in a package, not just data frames. Then make sure you're in your
package directory and use the `use_data()` function, listing each object that
you want to include in your package. The names of the objects that you pass as
arguments to `use_data()` will be the names of the objects when a user loads the
package, so make sure you like the variable names that you're using.

You should then document each data object that you're including in the package.
This way package users can use common R help syntax like `?dataset` to find out
more information about the included data set. You should create one R file
called `data.R` in the `R/` directory of your package. You can write the data
documentation in the `data.R` file. Let's take a look at some documentation
examples from the `minimap` package. First we'll look at the documentation for
a data frame called `maple`:


```r
#' Production and farm value of maple products in Canada
#'
#' @source Statistics Canada. Table 001-0008 - Production and farm value of
#'  maple products, annual. \url{http://www5.statcan.gc.ca/cansim/}
#' @format A data frame with columns:
#' \describe{
#'  \item{Year}{A value between 1924 and 2015.}
#'  \item{Syrup}{Maple products expressed as syrup, total in thousands of gallons.}
#'  \item{CAD}{Gross value of maple products in thousands of Canadian dollars.}
#'  \item{Region}{Postal code abbreviation for territory or province.}
#' }
#' @examples
#' \dontrun{
#'  maple
#' }
"maple"

Data frames that you include in your package should follow the general schema above where the documentation page has the following attributes:

  • An informative title describing the object.
  • A @source tag describing where the data was found.
  • A @format tag which describes the data in each column of the data frame.
  • And then finally a string with the name of the object.

The minimap package also includes a few vectors. Let’s look at the documentation for mexico_abb:

#' Postal Abbreviations for Mexico
#'
#' @examples
#' \dontrun{
#'  mexico_abb
#' }
"mexico_abb"

You should always include a title for a description of a vector or any other object. If you need to elaborate on the details of a vector you can include a description in the documentation or a @source tag. Just like with data frames the documentation for a vector should end with a string containing the name of the object.

3.4.7.1 Raw Data

A common task for R packages is to take raw data from files and to import them into R objects so that they can be analyzed. You might want to include some sample raw data files so you can show different methods and options for importing the data. To include raw data files in your package you should create a directory under inst/extdata in your R package. If you stored a data file in this directory called response.json in inst/extdata and your package is named mypackage then a user could access the path to this file with system.file("extdata", "response.json", package = "mypackage"). Include that line of code in the documentation to your package so that your users know how to access the raw data file.

3.4.8 Internal Data

Functions in your package may need to have access to data that you don’t want your users to be able to access. For example the swirl package contains translations for menu items into languages other than English, however that data has nothing to do with the purpose of the swirl package and so it’s hidden from the user. To add internal data to your package you can use the use_data() function from devtools, however you must specify the internal = TRUE argument. All of the objects you pass to use_data(..., internal = TRUE) can be referenced by the same name within your R package. All of these objects will be saved to one file called R/sysdata.rda.

3.4.9 Data Packages

There are several packages which were created for the sole purpose of distributing data including janeaustenr, gapminder, babynames, and lego. Using an R package as a means of distributing data has advantages and disadvantages. On one hand the data is extremely easy to load into R, as a user only needs to install and load the package. This can be useful for teaching folks who are new to R and may not be familiar with importing and cleaning data. Data packages also allow you document datasets using roxygen2, which provides a much cleaner and more programmer-friendly kind of code book compared to including a file that describes the data. On the other hand data in a data package is not accessible to people who are not using R, though there’s nothing stopping you from distributing the data in multiple ways.

If you decide to create a data package you should document the process that you used to obtain, clean, and save the data. One approach to doing this is to use the use_data_raw() function from devtools. This will create a directory inside of your package called data_raw. Inside of this directory you should include any raw files that the data objects in your package are derived from. You should also include one or more R scripts which import, clean, and save those data objects in your R package. Theoretically if you needed to update the data package with new data files you should be able to just run these scripts again in order to rebuild your package.

3.4.10 Summary

Including data in a package is useful for showing new users how to use your package, using data internally, and sharing and documenting datasets. The devtools package includes several useful functions to help you add data to your package including use_data() and use_data_raw(). You can document data within your package just like you would document a function.