class: center, middle # Working with Range Data ## Buffalo Chapter 9 ??? Notes for the _first_ slide! --- # Introduction -- ## Chromosomes and Coordinate Systems: -- * because chromosomes are linear strings of nucleotides, they can be represented with a coordinate system -- * the coordinate system allows us to reference particular regions of a chromosome: * _a gene_ * _an exon_ * _a transposable element_ -- * once genomic datasets are represented as genomic ranges, a series of operations, particularly those involving overlap and proximity, can be carried out --- # Components of a Genomic Range -- ## Three pieces of information are necessary to specify a genomic region: -- * **Chromosome (or scaffold/contig) name**: naming conventions can vary (_e.g.,_ chr4, 4, scaffold_627) -- * **Range**: the start and end site of a genomic region (_e.g.,_ 123,654,834 to 123,654,985) -- * **Strand**: DNA is double-stranded and features can be on the forward (positive) or reverse (negative) strand. Features such as proteins can be strand-specific. -- ## Important Note: Ranges are always tied to a particular version of a genome --- ## An example of genomic ranges: --
--- ## Flavors of range systems: -- * 0-based systems: half-closed, half-open intervals -- * Range width = end - start -- * 1-based systems: closed intervals -- * Range width = end - start + 1 --
--- ## The Buffalo tutorials on genomic ranges use packages from the Bioconductor open source software project: * IRanges * GenomicRanges -- ### To Install Bioconductor's Primary Packages in R: ``` > source("http://bioconductor.org/biocLite.R") > biocLite() ``` -- ### GenomicRanges can be installed with: ``` > biocLite("GenomicRanges") ``` -- ### Bioconductor packages have great documentation:
--- Let's begin working with ranges in the IRanges program which was installed as a dependency of GenomicRanges: ``` > library(IRanges) ``` -- Ranges are created as "IRange Objects" by specifying start and end sites: ``` > rng <- IRanges(start=4, end=13) > rng IRanges of length 1 start end width [1] 4 13 10 ``` -- Is this 1-based or 0-based? -- Now try creating a range by specifying start site and width: ``` rng2 <- IRanges(start=4, width=3) rng2 ``` --- Ranges can also be created using vector arguments: ``` > x <- IRanges(start=c(4, 7, 2, 20), end=c(13, 7, 5, 23)) > x ``` -- And we can give these ranges names: ``` > names(x) <- letters[1:4] > x ``` -- Note that the ranges we create are a special object with class IRanges: ``` > class(x) ``` -- Components of this object can be accessed with start(x), end(x), width(x) -- This can by handy when we want to increment one of these components: ``` > end(x) <- end(x) + 4 > x ``` --- The `range()` function can also be used to inspect the entire length of all ranges in an IRanges object: ``` > range(x) ``` -- We can also take subsets of ranges in an IRanges object using numeric, logical, and character indices -- For example, try the following: ``` > x[2:3] > start(x) < 5 > x[start(x) < 5] > x[width(x) > 8] ``` -- Ranges can also be merged, just as we've done with vectors using the `c()` command: ``` > a <- IRanges(start=7, width=4) > b <- IRanges(start=2, end=5) > c <- c(a, b) > c ``` --- ### Basic Range Operations: -- IRanges objects can be grown or shrunk using arithmetic operations such as +, -, and * (division is not supported since it makes little sense with ranges) -- For example, try: ``` > x <- IRanges(start=c(40, 80), end=c(67, 114)) > x + 4L > x - 10L ``` -- What's going on here? --
--- Sometimes, rather than growing or shrinking ranges, we want to restrict them within particular bounds -- In this situation, the function `restrict()` comes in handy: ``` > y <- IRanges(start=c(4, 6, 10, 12), width=13) > y > restrict(y, 5, 10) ``` -- What's happening here? --
--- Perhaps you would like to create entirely new ranges based on their relative position to ranges you already have -- When might this be handy? -- For this application, you would use the `flank()` command: ``` > flank(x, width=7) > flank(x, width=7, start=FALSE) > promoters <- flank(x, width=20) ``` -- And here's a visual representation of these new ranges:
---