Using Cloud Computing for RNA-seq: Tutorial
NYU Center for Health Informatics & Bioinformatics
The full tutorial is available in a PDF file. To view the PDF, click here.
RNA-seq Measures Gene Expression
- Takes advantage of the rapidly dropping cost of Next-Generation DNA sequencing
- Measures gene expression in true genome-wide fashion (all the RNA)
- Also enables detection of mutations (SNPs), alternative splicing, allele specific expression, and fusion genes
- More accurate and better dynamic range than Microarray
- Can be used to detect miRNA, ncRNA, and other non-coding RNA
RNA-seq is very compute intensive
- Billions of reads
- Large file sizes (tens of GB)
- Alignment to complete reference genomes
- Spliced alignment
- Like most genomics research institutes, NYU has purchases substantial High Performance Computing (HPC) resources to support our NGS lab.
- Cluster of servers
- Machines with large amount of RAM
- Data storage and backup system
Cloud = Renting Computers
- Instead of buying a High Performance Computing system, rent time on one from a vendor
- Amazon EC2 has simplified this process
- Scalable: Pay just for the computing you need, only when you need it.
- Also has benefits to move and share data among many users at different institutions with different security policies
The rest of the tutorial is available in a PDF file. To view the PDF, click here.