This lesson is being piloted (Beta version)

R Data Skills for Bioinformatics: Glossary

Key Points

Getting Started with R and RStudio
  • Use RStudio to create and manage projects with consistent layout.

  • Treat raw data as read-only.

  • Treat generated output as disposable.

  • Separate function definition and application.

  • Use version control.

R Language Basics
  • R has the usual arithmetic operators and mathematical functions.

  • Use <- to assign values to variables.

  • Use ls() to list the variables in a program.

  • Use rm() to delete objects in a program.

  • Use install.packages() to install packages (libraries).

  • Use library(packagename)to make a package available for use

  • Use help() to get online help in R.

Data Structures
  • Atomic vectors are usually created with c(), short for combine;

  • Lists are constructed by using list();

  • Data frames are created with data.frame(), which takes named vectors as input;

  • The basic data types in R are double, integer, complex, logical, and character;

  • All objects can have arbitrary additional attributes, used to store metadata about the object;

  • Adding a dim() attribute to an atomic vector creates a multi-dimensional array;

Data Subsetting
  • Access individual values by location using [].

  • Access slices of data using [low:high].

  • Access arbitrary sets of data using [c(...)].

  • Use which to select subsets of data based on value.

Data Transformation
  • Read in a csv file using read_csv()

  • View a dataframe with View

  • Use filter() to pick observations by their values

  • Use arrange() to order the rows

  • Use select() to pick variables by their names

  • Use mutate() to create new variables with functions of existing variables

  • Use summarize() to collapse many values down to a single summary

Visualizing Data
  • Use ggplot2 to create plots.

  • Think about graphics in layers: aesthetics, geometry, statistics, scale transformation, and grouping.

Writing and Applying Functions to Data
  • Use function to define a new function in R.

  • Use parameters to pass values into functions.

  • Load functions into programs using source.

Developing Workflows with R Scripts
  • Use if and else to make choices.

  • Use for to repeat operations.

Data Import
  • Use read_cvs to read in CSV files

  • Use read_tvs to read in TSV files

  • Use write_csv() and write_tsv() to write such files

  • Supply col_types to read functions in your script to insure consistancy

  • Use guess_encoding() to guess encoding of strings in old docs

Glossary

FIXME