Overview
Questions
What are the basic datatypes I can use in Python?
How do I define a function?
How do I write documentation for my Python code?
How do I install and manage packages?
Objectives
Brief overview of basic datatypes like lists, tuples, & dictionaries.
Syntax for defining a function.
Recommendations for proper code documentation.
Installing, updating, and importing packages.
Verify that everyone’s Python environment is ready.
The most basic data types in Python are strings, integers and floats:
text = "Go Cyclones"
number = 42
pi_value = 3.1415
Here we’ve assigned data to variables, namely text
, number
and pi_value
,
using the assignment operator =
. The variable called text
is a string which
means it can contain letters and numbers. We could reassign the variable text
to an integer too - but be careful reassigning variables as this can get
confusing.
To print out the value stored in a variable we can simply type the name of the variable into the interpreter:
text
Go Cyclones
however, in scripts we must use the print
function:
# Comments start with #
# Next line will print out text
print(text)
Go Cyclones
Lists are a common data structure to hold an ordered sequence of elements. Each element can be accessed by an index. Remember that Python indices start with 0 instead of 1:
numbers = [1,2,3]
numbers[0]
1
A for
loop can be used to access the elements in a list or other Python data structure one at a time:
for num in numbers:
print(num)
1
2
3
Remember indentation is very important in Python. Note that the second line in the example above is indented. This is Python’s way of marking a block of code.
To add elements to the end of a list, we can use the append
method:
numbers.append(4)
print(numbers)
[1,2,3,4]
Methods are a way to interact with an object (a list, for example). We can invoke
a method using the dot .
followed by the method name and a list of arguments in parentheses.
To find out what methods are available for an object, we can use the built-in help
command (you can exit the help page by typing q
):
help(numbers)
Help on list object:
class list(object)
| list(iterable=(), /)
|
| Built-in mutable sequence.
|
| If no argument is given, the constructor creates a new empty list.
| The argument must be an iterable if specified.
|
| Methods defined here:
|
| __add__(self, value, /)
| Return self+value.
...
We can also access a list of methods using dir
. Some methods names are
surrounded by double underscores. Those methods are called “special”, and
usually we access them in a different way. For example __add__
method is
responsible for the +
operator.
dir(numbers)
['__add__', '__class__', '__contains__', ...]
A tuple is similar to a list in that it’s an ordered sequence of elements. However,
tuples cannot be changed once created (they are “immutable”). Tuples are
created by placing comma-separated values inside parentheses ()
.
# tuples use parentheses
my_tuple = (0,1,2)
colors = ('red','blue','green')
# Note: lists use square brackets
my_list = [0,1,2]
What happens when you try to change a list versus a tuple?
Change the value of a single element in
my_list
andmy_tuple
.Solution: List
my_list[1] = 5 print("my_list =", my_list)
my_list = [0, 5, 2]
Solution: Tuple
my_tuple[1] = 5
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment
Once created, tuples cannot be changed in any way. This means you cannot add new values to your tuple container. As a result, Python stores these variables differently from lists since there is no need to allocate any additional memory for potential changes in size. This, in turn, makes tuples more memory efficient.
One very handy feature in Python is that the individual elements of a list or tuple can be assigned to individual variable names. For example you can create a list:
my_list = ['Cyclones','Go','!']
Then assign the individual values of that list to new separate variables just by separating the names with commas:
v1, v2, v3 = my_list
print(v2, v1, v3)
Go Cyclones !
This works the same for a tuple:
my_tup = ('U','I','S')
c, a, b = my_tup
print(a, b, c)
I S U
In Python, it is also possible to mix types in a list or tuple. This makes it easy to store data of variable type. Thus, you can have a list that contains both natural numbers, floating point numbers, strings, lists, and tuples:
my_data = [('Manidae', 'Phataginus', 'tricuspis', 1821), 23.56, 30, [10.8,8.45,9.33]]
A dictionary is a container that holds pairs of objects - keys and values. This datatype is one of the most useful types in Python.
numbers = {"one" : 1, "two" : 2}
numbers["one"]
1
Dictionaries work a lot like lists - except that you index them with keys. You can think about a key as a name for or a unique identifier for a set of values in the dictionary. Keys can only have particular types - they have to be “hashable”, which means they don’t change. Strings, numeric types, and tuples are acceptable, but lists are not.
Here is a dictionary with natural numbers as keys. You access the values associated with each key, by providing the key in []
numbers2 = {1 : "one", 2 : "two"}
numbers2[1]
'one'
You can also use a tuple as a key, but only if the tuple contains hashable elements (strings, numbers, tuples):
my_dict2 = {(1,2,3) : 3}
my_dict2
{(1, 2, 3): 3}
However, you cannot create a dictionary with a list as a key:
bad_dictionary = {[1,2,3] : 3}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
To add an item to the dictionary we assign a value to a new key:
numbers2 = {1 : 'one', 2 : 'two'}
numbers2[3] = 'three'
numbers2
{1: 'one', 2: 'two', 3: 'three'}
It is often necessary to check if the dictionary has a key before adding it, otherwise you will overwrite the value associated with that key:
if 4 not in numbers2:
numbers2[4] = 'four'
numbers2
{1: 'one', 2: 'two', 3: 'three', 4: 'four'}
Using for
loops with dictionaries is a little more complicated. We can do this
in two ways.
We can use the method .items()
to return the key-value pair (this is the recommended way of iterating over a dictionary in Python):
for key, value in numbers2.items():
print(key, "->", value)
1 -> one
2 -> two
3 -> three
4 -> four
Alternatively, you can iterate over the keys. When you loop over the dictionary in this way, Python just assigns the key value to the iterator variable:
for key in numbers2:
print(key, "->", numbers2[key])
1 -> one
2 -> two
3 -> three
4 -> four
Can you reassign the value of an element in a dictionary?
Try to reassign the second value of
numbers2
(in the key value pair) so that it no longer reads “two” but instead reads “spam and eggs”. What does your dictionary look like after you make this change?Solution
numbers2[2] = 'spam and eggs' numbers2
{1: 'one', 2: 'spam and eggs', 3: 'three', 4: 'four'}
It is important to note that dictionaries in Python 3.6 or greater are insertion ordered and preserve the sequence of their items (i.e., the order in which key:value pairs were added to the dictionary). In previous versions of Python, dictionaries are unordered, where the order in which items are returned from loops over dictionaries might appear random and can even change with time.
Regardless of the order in which they were entered, if the dictionary’s keys are of a single type that can be sorted (numbers or strings), then you can get a sorted list of the keys and iterate over it in a loop.
A loop over the values of an unsorted dictionary:
d = {3 : 'three', 2 : 'two', 6 : 'six', 5 : 'five'}
for value in d.values():
print(value)
three
two
six
five
To get a list of sorted keys, you can use the sorted()
function on the list returned by the .keys()
method:
d = {3 : 'three', 2 : 'two', 6 : 'six', 5 : 'five'}
for key in sorted(d.keys()):
print(key, ' = ', d[key])
2 = two
3 = three
5 = five
6 = six
Defining part of a program in Python as a function is done using the def
keyword. For example a function that takes two arguments and returns their sum
can be defined as:
def add_function(a, b):
result = a + b
return result
z = add_function(20, 22)
print(z)
42
The important members of a function are the input arguments and the return values. (Note that not all functions have these components.)
In the function called add_function(a,b)
above
the arguments are a
and b
and the returned
variable is result
.
One very handy feature of Python is that a function can return multiple variables. This essentially returns the objects as a tuple.
def fun_function(lett_str, num_str):
lett_up = lett_str.upper() # string.upper() converts to uppercase
int_num = int(num_str) # int(string) converts to an integer
return lett_up, int_num+1 # returns a tuple (increments num +1)
Thus, you can assign the multiple return values to different variables when calling the function.
a, b = fun_function("octothorpe", "1886")
print(a)
print(b)
OCTOTHORPE
1887
Fun fact: octothorpe is a word from the 1960s or 1970s that was used to describe a hash symbol: #
.
When writing programs, code, scripts, etc., it is very important to provide good documentation of your work. This is because you want your science to be reproducible by anyone who evaluates your work or wishes to build on your findings. But, even more importantly, you need to properly document your code so that you know what your own programs and scripts do when you open your them again after spending several months finishing some other project.
Here are some basic tips for writing code documentation using comments and/or Markdown formatted descriptions in a notebook.
It is good to always provide a description of code files. This is done at the very top of the file or notebook. The window below gives an example of header comments in a Jupyter notebook. This header provides a description of the contents of the file, lists any packages or files that it requires to execute without errors, and gives the names of the authors and date it was written.
When writing functions in your code, it is important to provide details about how the function works. You can do this as comments or Markdown formatted text in your notebook. When written above the function definition, this helps to make it clear that the code below is a function. An added benefit of using Jupyter notebooks is that you can create hyperlinks and other references in your documentation. This is a nice way to cite sources and provide clear information. See the example from a Jupyter notebook below.
While function comments and Markdown descriptions are great for making it easy to find functions in a file and read information about them, Python also has a way for you to provide information about a function that can be retrieved from the function itself. These are called Docstrings and are inserted right after the first line instantiating the function.
def cat_two_strings(a, b):
"""Description: Concatenates two strings
Arguments:
a : the first string
b : the second string
Return: a string that is a + b
Example of usage:
>>> string_a = "howdy "
>>> string_b = "world"
>>> ab_string = cat_two_strings(string_a, string_b)
>>> print(ab_string)
Output:
howdy world
"""
return a + b
The Docstring is contained within
triple quotes: """Docstring"""
. And after a function
is defined, the information from the Docstring is
available using the help()
function:
help(cat_two_strings)
Help on function cat_two_strings in module __main__:
cat_two_strings(a, b)
Description: Concatenates two strings
Arguments:
a : the first string
b : the second string
Return: a string that is a + b
Example of usage:
>>> string_a = "howdy "
>>> string_b = "world"
>>> ab_string = cat_two_strings(string_a, string_b)
>>> print(ab_string)
Output:
howdy world
You may also find it necessary to add comments
within your code. In Python comments are
preceded by an octothorpe (i.e., #
). It really is up to you
to decide when and how to use such comments.
Ultimately, this type of documentation is
really for adding information or details
in places where it isn’t obvious from the
code what is going on. Another use for
in-line comments is to add flags for debugging
or “to-dos” for work that is currently in progress.
In these cases, the comments should be written on the line
above the code it refers to or just to the
right on the same line.
def cat_two_strings(a, b):
return a + b # TODO: verify that a & b are strings
For experienced programmers, it is probably best to avoid too many superfluous comments in your code. This is because code with lots of comments may be difficult to read. And you should work to write code that can be understood without comments.
Look at the comments below. Are they really all that necessary? Do these in-line comments distract from the actual written code?
Nevertheless, even the comments above may be helpful to someone new to Python. And as you learn a new programming language, you may find it helpful to annotate your scripts this way or to add notes that help you remember important syntax details, etc.
There is an immense number of Python packages (also called libraries) out there that do a lot of different things. Python doesn’t have a heavily managed central resource like CRAN for R, but you can find a long and probably incomplete list of packages for Python online. Additionally, Anaconda provides easy installs of over 600 packages listed here.
Anaconda helps install many important Python packages. You can use conda
to install others that do not come with the default install. One very useful Python library for bioinformatics is biopython. You can install this it easily with conda
from your terminal window.
$ conda install biopython
The prompt will ask you if you want to proceed with installing this package. Simply type y
followed by the enter key and conda
will manage the download and installation.
For packages that are not available via Anaconda, you may also be able to use the application pip
to install.
One the package string-color
allows you to print strings to screen in different colors.
$ pip install string-color
Anaconda makes it easy to update packages in conda
. For any individual package, you can use update
. The example below updates the package pandas, which allows us to deal with complex data structures.
$ conda update pandas
Alternatively, you can update all of the packages managed by Anaconda using:
$ conda update --all
For packages installed using pip
, you can use the -U
flag or --upgrade
.
$ pip install string-color -U
Once you have a package installed, you don’t have to do that again (except you may have to update them from time-to-time). However, you cannot use a package in any Python instance without importing it first. This is quite simple.
Open a Python interpreter and type:
import stringcolor
With this package, you can now try to print in color: print(stringcolor.cs("PURPLE", "purple"))
For convenience, you can also give packages aliases in your scripts. That way you don’t have to type out the whole name each time. The example below imports the Python package numpy and uses the numpy.sum()
function to add up the values in a list. Notice that np
is an alias for numpy
.
import numpy as np
np.sum([2,4,6])
12
import antigravity
And sage advice:
import this
Take-Home Challenge: Create a Documented Jupyter Notebook Exploring Python Strings and Lists
Create a new Jupyter notebook and document the following exercises with explanations about your solutions.
Create a string
a = 'Go Cyclones'
and use python to extract the character'l'
from that string and assign it to a new variable. What index is that character assigned in the string?Convert your string
a
so that it is spelled out backwards'senolcyC oG'
. (Try using a for loop, or you can find examples of how to do this with Python subsetting techniques.)Create two lists
l1 = [53, 16, 1, 86, 3, 33, 41, 76, 71, 4]
andl2 = [30, 100, 94, 90, 17, 11, 39, 23, 90, 52]
. Use python to create a new list that contains only the odd numbers from both lists. The third list should look like this:[53, 1, 3, 33, 41, 71, 17, 11, 39, 23]
.Solutions
(1) The index for character
l
isa[6]
.a = 'Go Cyclones' b = a[6] print(b)
l
(2) Use list indexing notation to reverse the string.
print(a[::-1])
senolcyC oG
(3) Use the modulo operator
%
to determine if the list element is divisible by2
.l1 = [53, 16, 1, 86, 3, 33, 41, 76, 71, 4] l2 = [30, 100, 94, 90, 17, 11, 39, 23, 90, 52] for n in l1+l2: if n % 2 != 0: l3.append(n) print(l3)
[53, 1, 3, 33, 41, 71, 17, 11, 39, 23]
Key Points
Are you flying yet?