The materials we will work through are a sample of the lessons created for Data Carpentry and Software Carpentry. The goal of this section is to demonstrate the utility of Python for working with biological data.
Python is a great programming language that is relatively easy to learn and also very easy to read. For bioinformatics, you will find yourself needing to know this language. Many tools for dealing with genomic data are written in Python and knowing how to program in Python allows you to modify these tools and assemble them together in cohesive pipelines. Ethan White outlines some nice reasons for using Python on his Programming for Biologists site. Where he also references this xkcd comic:
Most of the lessons we will use for this course were written for the Data Carpentry series of workshops. In particular, we are using the Data Analysis & Visualization in Python: Python for Ecologists set of lessons.
The data we are using for this lesson are from the Portal Project Teaching Database - available on FigShare. More details about the files we’ll use and where to download them are available on the Setup page
Prerequisites
Learners need to understand the concepts of files and directories (including the working directory) and how to start a Python interpreter before tackling this lesson. This lesson references the Jupyter notebook although it can be taught through any Python interpreter. The commands in this lesson pertain to Python 3.
To get started with installing Python, follow the directions given in the Python section of the course Software page. In addition to installing Python on your own computer, you will also need to download the data files used in the tutorials. Details for doing this are found in the Setup page.
These lessons will provide command line text and code in specific formats.
All commands that are intended to be executed in your Unix terminal will be shown with the $
prompt. For example:
$ cd my_directory
$ pwd
All output from any execution will be shown with a black bar on the side:
/home/my_directory
All Python code will be given in boxes with a purple bar on the side and in purple text, with no prompt:
import numpy as np
a = 12
Introduction to Python |
What is Python?
How is Python different from R? How do I use Python? |
|
Introduction to Python Datatypes and Packages |
What are the basic datatypes I can use in Python?
How do I define a function? How do I write documentation for my Python code? How do I install and manage packages? |
|
Working With Pandas DataFrames in Python |
How do you import data into Python?
How do you create a DataFrame and access its contents? How do you create simple plots? |
|
Indexing, Slicing, Subsetting, and Iterating DataFrames in Python |
How do you extract data from columns and rows?
How do you select subsets of the DataFrame? How do you reassign values within the DataFrame? |
|
Visualizing Data in Python |
How do you create appealing plots in Python?
How do you compare distributions of data? How do different plotting libraries work? |
|
Introduction to Biopython |
What does Biopython do?
How does Biopython handle sequences? How can I access sequences and data from Genbank? |
|
Additional Exercises | Practice your python. |