3.4 Documentation
The objectives of this section are:
- Create R function documentation using roxygen2
- Create vignettes using knitr and R Markdown
There are two main types of documentation you may want to include with packages:
- Longer documents that give tutorials or overviews for the whole package
- Shorter, function-specific help files for each function or group of related functions
You can create the first type of document using package vignettes, README files, or both. For the function-specific help files, the easiest way to create these is with the roxygen2
package.
In this section, we’ll cover why and how to create this documentation. In addition, vignette / README documentation can be done using knitr
to create R Markdown documents that mix R code and text, so we’ll include more details on that process.
3.4.1 Vignettes and README files
You will likely want to create a document that walks users through the basics of how to use your package. You can do this through two formats:
- Vignette: This document is bundled with your R package, so it becomes locally available to a user once they install your package from CRAN. They will also have it available if they install the package from GitHub, as long as they use the
build_vignettes = TRUE
option when runninginstall_github
. - README file: If you have your package on GitHub, this document will show up on the main page of the repository.
A package likely only needs a README file if you are posting the package to GitHub. For any GitHub repository, if there is a README.md file in the top directory of the repository, it will be rendered on the main GitHub repository page below the listed repository content. For an example, visit https://github.com/geanders/countytimezones and scroll down. You’ll see a list of all the files and subdirectories included in the package repository and below that is the content in the package’s README.md
file, which gives a tutorial on using the package.
If the README file does not need to include R code, you can write it directly as an .md
file, using Markdown syntax, which is explained in more detail in the next section. If you want to include R code, you should start with a README.Rmd
file, which you can then render to Markdown using knitr
. You can use the devtools
package to add either a README.md
or README.Rmd
file to a package directory using use_readme_md
or use_readme_rmd
, respectively. These functions will add the appropriate file to the top level of the package directory and will also add the file name to “.Rbuildignore,” since having one of these files in the top level of the package directory could otherwise cause some problems when building the package.
The README file is a useful way to give GitHub users information about your package, but it will not be included in builds of the package or be available through CRAN for packages that are posted there. Instead, if you want to create tutorials or overview documents that are included in a package build, you should do that by adding one or more package vignettes. Vignettes are stored in a vignettes
subdirectory within the package directory.
To add a vignette file, saved within this subdirectory (which will be created if you do not already have it), use the use_vignette
function from devtools
. This function takes as arguments the file name of the vignette you’d like to create and the package for which you’d like to create it (the default is the package in the current working directory). For example, if you are currently working in your package’s top-level directory and you would like to add a vignette called “model_details,” you can do that with the code:
use_vignette("model_details")
You can have more than one vignette per package, which can be useful if you want to include one vignette that gives a more general overview of the package as well as a few vignettes that go into greater detail about particular aspects or applications.
T> Once you create a vignette with use_vignette
, be sure to update the Vignette Index Entry in the vignette’s YAML (the code at the top of an R Markdown document). Replace “Vignette Title” there with the actual title you use for the vignette.
3.4.2 Knitr / Markdown
Both vignettes and README files can be written as R Markdown files, which will allow you to include R code examples and results from your package. One of the most exciting tools in R is the knitr
system for combining code and text to create a reproducible document. In terms of the power you get for time invested in learning a tool, knitr
probably can’t be beat. Everything you need to know to create and “knit” a reproducible document can be learned in about 20 minutes, and while there is a lot more you can do to customize this process if you want to, probably 80% of what you’ll ever want to do with knitr
you’ll learn in those first 20 minutes.
R Markdown files are mostly written using Markdown. To write R Markdown files, you need to understand what markup languages like Markdown are and how they work. In Word and other word processing programs you have used, you can add formatting using buttons and keyboard shortcuts (e.g., “Ctrl-B” for bold). The file saves the words you type. It also saves the formatting, but you see the final output, rather than the formatting markup, when you edit the file (WYSIWYG – what you see is what you get). In markup languages, on the other hand, you markup the document directly to show what formatting the final version should have (e.g., you type **bold**
in the file to end up with a document with bold). Examples of markup languages include:
- HTML (HyperText Markup Language)
- LaTex
- Markdown (a “lightweight” markup language)
3.4.3 Common Markdown formatting elements
To write a file in Markdown, you’ll need to learn the conventions for creating formatting. This table shows what you would need to write in a flat file for some common formatting choices:
Code | Rendering | Explanation |
---|---|---|
**text** |
text | boldface |
*text* |
text | italicized |
[text](www.google.com) |
text | hyperlink |
# text |
first-level header | |
## text |
second-level header |
Some other simple things you can do in Markdown include:
- Lists (ordered or bulleted)
- Equations
- Tables
- Figures from files
- Block quotes
- Superscripts
The start of a Markdown file gives some metadata for the file (authors, title, format) in a language called YAML. For example, the YAML section of a package vignette might look like this:
---
title: "Model Details for example_package"
author: "Jane Doe"
date: "2020-12-20"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Model Details for example_package}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
When creating R Markdown documents using the RStudio toolbar, much of this YAML will be automatically generated based on your specifications when opening the initial file. However, this is not the case with package vignettes, for which you’ll need to go into the YAML and add the authors and title yourself. Leave the vignette engine, vignette encoding, output, and date as their default values.
For more Markdown conventions, see RStudio’s R Markdown Reference Guide (link also available through “Help” in RStudio).
R Markdown files work a lot like Markdown files, but add the ability to include R code that will be run before rendering the final document. This functionality is based on literate programming, an idea developed by Donald Knuth, to mix executable code with regular text. The files you create can then be “knitted,” to run any embedded code. The final output will have results from your code and the regular text.
The basic steps of opening and rendering an R Markdown file in RStudio are:
- To open a new R Markdown file, go to “File” -> “New File” -> “RMarkdown….” To start, choose a “Document” in “HTML” format.
- This will open a new R Markdown file in RStudio. The file extension for R Markdown files is “.Rmd.”
- The new file comes with some example code and text. You can run the file as-is to try out the example. You will ultimately delete this example code and text and replace it with your own.
- Once you “knit” the R Markdown file, R will render an HTML file with the output. This is automatically saved in the same directory where you saved your .Rmd file.
- Write everything besides R code using Markdown syntax.
The knit
function from the knitr
package works by taking a document in R Markdown format (among a few possible formats), reading through it for any markers of the start of R code, running any of the code between that “start” marker and a marker showing a return to regular Markdown, writing any of the relevant results from R code into the Markdown file in Markdown format, and then passing the entire document to software that can render from Markdown to the desired output format (for example, compile a pdf, Word, or HTML document).
This means that all a user needs to do to include R code within a document is to properly separate it from other parts of the document through the appropriate markers. To indicate R code in an RMarkdown document, you need to separate off the code chunk using the following syntax:
```{r}
my_vec <- 1:10
```
This syntax tells R how to find the start and end of pieces of R code (code chunks) when the file is rendered. R will walk through, find each piece of R code, run it and create output (printed output or figures, for example), and then pass the file along to another program to complete rendering (e.g., Tex for pdf files).
You can specify a name for each chunk, if you’d like, by including it after “r” when you begin your chunk. For example, to give the name load_mtcars
to a code chunk that loads the mtcars
dataset, specify that name in the start of the code chunk:
```{r load_mtcars}
data(mtcars)
```
T> Here are a couple of tips for naming code chunks:
T>
T> - Chunk names must be unique across a document.
T> - Any chunks you don’t name are given ordered numbers by knitr
.
You do not have to name each chunk. However, there are some advantages:
- It will be easier to find any errors.
- You can use the chunk labels in referencing for figure labels.
- You can reference chunks later by name.
3.4.4 Common knitr chunk options
You can also add options when you start a chunk. Many of these options can be set as TRUE / FALSE and include:
Option | Action |
---|---|
echo |
Print out the R code? |
eval |
Run the R code? |
messages |
Print out messages? |
warnings |
Print out warnings? |
include |
If FALSE, run code, but don’t print code or results |
Other chunk options take values other than TRUE / FALSE. Some you might want to include are:
Option | Action |
---|---|
results |
How to print results (e.g., hide runs the code, but doesn’t print the results) |
fig.width |
Width to print your figure, in inches (e.g., fig.width = 4 ) |
fig.height |
Height to print your figure |
To include any of these options, add the option and value in the opening brackets and separate multiple options with commas:
```{r messages = FALSE, echo = FALSE}
mtcars[1, 1:3]
```
You can set “global” options at the beginning of the document. This will create new defaults for all of the chunks in the document. For example, if you want echo
, warning
, and message
to be FALSE
by default in all code chunks, you can run:
```{r global_options}
knitr::opts_chunk$set(echo = FALSE, message = FALSE,
warning = FALSE)
```
If you set both global and local chunk options that you set specifically for a chunk will take precedence over global options. For example, running a document with:
```{r global_options}
knitr::opts_chunk$set(echo = FALSE, message = FALSE,
warning = FALSE)
```
```{r check_mtcars, echo = TRUE}
head(mtcars, 1)
```
would print the code for the check_mtcars
chunk, because the option specified for that specific chunk (echo = TRUE
) would override the global option (echo = FALSE
).
You can also include R output directly in your text (“inline”) using backticks:
“There are `r nrow(mtcars)`
observations in the mtcars
data set. The average miles per gallon is `r mean(mtcars$mpg, na.rm = TRUE)`
.”
Once the file is rendered, this gives:
“There are 32 observations in the mtcars
data set. The average miles per gallon is 20.090625.”
T> Here are some tips that will help you diagnose some problems rendering R Markdown files: T> T> - Be sure to save your R Markdown file before you run it. T> - All the code in the file will run “from scratch”— as if you just opened a new R session. T> - The code will run using, as a working directory, the directory where you saved the R Markdown file. T> - To use the latest version of functions in a package you are developing in an R Markdown document, rebuild the package before knitting the document. You can build a package using the “Build” tab in one of the RStudio panes.
You’ll want to try out pieces of your code as you write an R Markdown document. There are a few ways you can do that:
- You can run code in chunks just like you can run code from a script (Ctrl-Return or the “Run” button).
- You can run all the code in a chunk (or all the code in all chunks) using the different options under the “Run” button in RStudio.
- All the “Run” options have keyboard shortcuts, so you can use those.
I> Two excellent books for learning more about creating reproducible documents with R are Dynamic Documents with R and knitr by Yihui Xie (the creator of knitr
) and Reproducible Research with R and RStudio by Christopher Gandrud. The first goes into the technical details of how knitr
and related code works, which gives you the tools to extensively customize a document. The second provides an extensive view of how to use tools from R and other open source software to conduct, write up, and present research in a reproducible and efficient way. RStudio’s R Markdown Cheatsheet is another very useful reference.
3.4.5 Help files and roxygen2
In addition to writing tutorials that give an overview of your whole package, you should also write specific documentation showing users how to use and interpret any functions you expect users to directly call.
These help files will ultimately go in a folder called /man
of your package, in an R documentation format (.Rd
file extensions) that is fairly similar to LaTeX. You used to have to write all of these files as separate files. However, the roxygen2
package lets you put all of the help information directly in the code where you define each function. Further, roxygen2
documentation allows you to include tags (@export
, @importFrom
) that will automate writing the package NAMESPACE file, so you don’t need to edit that file by hand.
With roxygen2
, you add the help file information directly above the code where you define each functions, in the R scripts saved in the R
subdirectory of the package directory. You start each line of the roxygen2
documentation with #'
(the second character is an apostrophe, not a backtick). The first line of the documentation should give a short title for the function, and the next block of documentation should be a longer description. After that, you will use tags that start with @
to define each element you’re including. You should leave an empty line between each section of documentation, and you can use indentation for second and later lines of elements to make the code easier to read.
Here is a basic example of how this roxygen2
documentation would look for a simple “Hello world” function:
#' Print "Hello world"
#'
#' This is a simple function that, by default, prints "Hello world". You can
#' customize the text to print (using the \code{to_print} argument) and add
#' an exclamation point (\code{excited = TRUE}).
#'
#' @param to_print A character string giving the text the function will print
#' @param excited Logical value specifying whether to include an exclamation
#' point after the text
#'
#' @return This function returns a phrase to print, with or without an
#' exclamation point added. As a side effect, this function also prints out
#' the phrase.
#'
#' @examples
#' hello_world()
#' hello_world(excited = TRUE)
#' hello_world(to_print = "Hi world")
#'
#' @export
hello_world <- function(to_print = "Hello world", excited = FALSE){
if(excited) to_print <- paste0(to_print, "!")
print(to_print)
}
You can run the document
function from the devtools
package at any time to render the latest version of these roxygen2
comments for each of your functions. This will create function-specific help files in the package’s “man” subdirectory as well as update the package’s NAMESPACE file.
3.4.8 Internal Data
Functions in your package may need to have access to data that you don’t want
your users to be able to access. For example the swirl
package contains
translations for menu items into languages other than English, however that data
has nothing to do with the purpose of the swirl
package and so it’s hidden
from the user. To add internal data to your package you can use the use_data()
function from devtools
, however you must specify the internal = TRUE
argument. All of the objects you pass to use_data(..., internal = TRUE)
can
be referenced by the same name within your R package. All of these objects will
be saved to one file called R/sysdata.rda
.
3.4.9 Data Packages
There are several packages which were created for the sole purpose of
distributing data including
janeaustenr,
gapminder,
babynames,
and lego.
Using an R package as a means of distributing data has advantages and
disadvantages. On one hand the data is extremely easy to load into R, as a user
only needs to install and load the package. This can be useful for teaching
folks who are new to R and may not be familiar with importing and cleaning data.
Data packages also allow you document datasets using roxygen2
, which provides
a much cleaner and more programmer-friendly kind of code book compared to
including a file that describes the data. On the other hand data in a data
package is not accessible to people who are not using R, though there’s nothing
stopping you from distributing the data in multiple ways.
If you decide to create a data package you should document the process that you
used to obtain, clean, and save the data. One approach to doing this is to use
the use_data_raw()
function from devtools
. This will create a directory
inside of your package called data_raw
. Inside of this directory you should
include any raw files that the data objects in your package are derived from.
You should also include one or
more R scripts which import, clean, and save those data objects in your R
package. Theoretically if you needed to update the data package with new data
files you should be able to just run these scripts again in order to rebuild
your package.
3.4.10 Summary
Including data in a package is useful for showing new users how to use your
package, using data internally, and sharing and documenting datasets. The
devtools
package includes several useful functions to help you add data to
your package including use_data()
and use_data_raw()
. You can document
data within your package just like you would document a function.