R Data Skills for Bioinformatics

Timing

Leave about 30 minutes at the start of each workshop and another 15 mins at the start of each session for technical difficulties like WiFi and installing things (even if you asked students to install in advance, longer if not).

Setting up git in RStudio

There can be difficulties linking git to RStudio depending on the operating system and the version of the operating system. To make sure Git is properly installed and configured, the learners should go to the Options window in the RStudio application.

To prevent the learners from having to re-enter their password each time they push a commit to GitHub, this command (which can be run from a bash prompt) will make it so they only have to enter their password once:

$ git config --global credential.helper 'cache --timeout=10000000'

Pulling in Data

The easiest way to get the data used in this lesson during a workshop is to have attendees run the following:

git remote add data https://github.com/resbaz/r-novice-gapminder-files
git pull data master

If Git is not being taught as part of the workshop the raw data can be downloaded from gapminder-FiveYearData and gapminder-FiveYearData-Wide.

Attendees can use the File - Save As dialog in their browser to save the file.

Overall

Make sure to emphasize good practices: put code in scripts, and make sure they’re version controlled. Encourage students to create script files for challenges.

If you’re working in a cloud environment, get them to upload the gapminder data after the second lesson.

Make sure to emphasize that matrices are vectors underneath the hood and data frames are lists underneath the hood: this will explain a lot of the esoteric behaviour encountered in basic operations.

Vector recycling and function stacks are probably best explained with diagrams on a whiteboard.

Be sure to actually go through examples of an R help page: help files can be intimidating at first, but knowing how to read them is tremendously useful.

Be sure to show the CRAN task views, look at one of the topics.

There’s a lot of content: move quickly through the earlier lessons. Their extensiveness is mostly for purposes of learning by osmosis: so that their memory will trigger later when they encounter a problem or some esoteric behaviour.

Key lessons to take time on:

Don’t worry about being correct or knowing the material back-to-front. Use mistakes as teaching moments: the most vital skill you can impart is how to debug and recover from unexpected errors.