Chapter 1 The R Programming Environment

This chapter provides a rigorous introduction to the R programming language, with a particular focus on using R for software development in a data science setting. Whether you are part of a data science team or working individually within a community of developers, this chapter will give you the knowledge of R needed to make useful contributions in those settings.

As the first chapter in this book, the chapter provides the essential foundation of R needed for the following chapters. We cover basic R concepts and language fundamentals, key concepts like tidy data and related “tidyverse” tools, processing and manipulation of complex and large datasets, handling textual data, and basic data science tasks. Upon finishing this chapter, you will have fluency at the R console and will be able to create tidy datasets from a wide range of possible data sources.

The learning objectives for this chapter are to:

  • Develop fluency in using R at the console
  • Execute basic arithmetic operations
  • Subset and index R objects
  • Remove missing values from an R object
  • Modify object attributes and metadata
  • Describe differences in different R classes and data types
  • Read tabular data into R and read in web data via web scraping tools and APIs
  • Define tidy data and to transform non-tidy data into tidy data
  • Manipulate and transform a variety of data types, including dates, times, and text data
  • Describe how memory is used in R sessions to store R objects
  • Read and manipulate large datasets
  • Describe how to diagnose programming problems and to look up answers from the web or forums