Cloud Computing for RNA-seq

Using Cloud Computing for RNA-seq: Tutorial

Stuart Brown
NYU Center for Health Informatics & Bioinformatics

The full tutorial is available in a PDF file. To view the PDF, click here.

RNA-seq Measures Gene Expression

  • Takes advantage of the rapidly dropping cost of Next-Generation DNA sequencing
  • Measures gene expression in true genome-wide fashion (all the RNA)
  • Also enables detection of mutations (SNPs), alternative splicing, allele specific expression, and fusion genes
  • More accurate and better dynamic range than Microarray
  • Can be used to detect miRNA, ncRNA, and other non-coding RNA

RNA-seq is very compute intensive

  • Billions of reads
  • Large file sizes (tens of GB)
  • Alignment to complete reference genomes
  • Spliced alignment
  • Like most genomics research institutes, NYU has purchases substantial High Performance Computing (HPC) resources to support our NGS lab.
    • Cluster of servers
    • Machines with large amount of RAM
    • Data storage and backup system

Cloud = Renting Computers

  • Instead of buying a High Performance Computing system, rent time on one from a vendor
  • Amazon EC2 has simplified this process
  • Scalable: Pay just for the computing you need, only when you need it.
  • Also has benefits to move and share data among many users at different institutions with different security policies

The rest of the tutorial is available in a PDF file. To view the PDF, click here.