//
Linux Introduction
Dr. Stratos Efstathiadis
efstra1os.efstathiadis@nyumc.org
Technical Director
High Performance Computing Facility
Center for Health Informatics and Bioinformatics
NYULMC

Bioinformatics Requires Powerful Computers

  • One definition of bioinformatics is:
    “The use of computers to analyze biological problems.”

  • As biological data sets have grown larger and biological problems have become more complex, therequirements for computing power have also grown.

  • Computers that can provide this power generally use the Unix operating system ‐ so you must learn Unix

Connect to server

  • Most NGS bioinformatics work requires access to a more powerful computer than your desktop/laptop.

  • This might be a server in your lab, central server for your University, or a rented Cloud server such as the Amazon Elastic Compute Cloud (EC2)

  • Typically, a Terminal program is used to communicate with the remote server

  • Macintosh computers have a nice built in Terminal program (in the Utilities folder)

  • Windows users should install PUTTY
    http://www.chiark.greenend.org.uk/~sgtatham/putty/

Remote Login using Secure Shell (ssh)

  • Secure Shell: A set of tools that allow secure interaction with

  • remote servers (ssh, sshd, ssh‐add, ssh‐keygen, ssh‐ agent, etc.) using two‐factor authentication, based on

  • What you know (a pass phrase)

  • What you have (a key created and stored on your local computer)

  • ssh is the de‐facto remote login mechanism in Linux/Unix.

  • rlogin, rsh, telnet, etc. use insecure protocols (transmit clear text passwords)

Unix Commands

  • Unix commands are short and cryptic like vi or rm.

  • Computer geeks like it that way; you will get used to it.

  • The command is not executed until you hit the Enter/Return key

  • use the arrow and delete keys to edit commands

  • Every command has a host of modifiers which are generally single letters preceded by a hyphen: ls ‐l or mv ‐R

  • Unix is cAsE SenSITIve! Capital letters have different functions than lower case letters, often completely unrelated.

  • A command also generally requires an argument, meaning some file on which it will act:
    cat ‐n mygene.seq

Your First Commands

  • bash‐3.2$ date
          Wed Jan 2 13:28:49 EST 2013

  • bash‐3.2$ pwd       (Present Working Directory)
          /home/username

  • bash‐3.2$ touch myfile       (create a new file)

  • bash‐3.2$ ls       (list files)
          myfile

Working with Directories

  • Directories are a means of organizing your files on a Unix computer.
          – They are equivalent to folders on Windows and Macintosh computers

  • Directories contain files, executable programs, and sub‐directories

  • Understanding how to use directories is crucial to manipulating your files on a Unix system.

Macintosh directory tree




[image from: Keith Bradnam & Ian Korf, Unix and Perl Primer for Biologists]

Your Home Directory

  • When you login to the server, you always start in your Home directory.

  • Create sub‐directories to store specific projects or groups of information, just as you would place folders in a filing cabinet.

  • Do not accumulate thousands of files with cryptic names in your Home directory

File & Directory Commands

  • This is a minimal list of Unix commands that you must know for file management:

    • ls (list)

    • mkdir (make directory)
    • cd (change directory)
    • rmdir (remove directory)
    • cp (copy)
    • pwd (present working directory)
    • mv (move)
    • more (view by page)
    • rm (remove)
    • cat (view entire file on screen)
  • All of these commands can be modified with many options. Learn to use Unix ‘man’pages for more information

Shortcuts

  • There are some important shortcuts in Unix for specifying directories and files

  • . (dot) means “the current directory”

  • .. means “the parent directory” ‐ the directory one level above the current directory, so cd .. will move you up one level

  • ~ (tilde) means your Home directory, so cd ~ will move you back to your Home.

    Just typing a plain cd will also bring you back to your home directory

  • * (asterix) is a wildcard, which can substitute for any number of letter or number characters in a filename

  • –the tab key can be used to auto‐complete long file names

  • [up arrow] —the up arrow key brings back past commands that you have typed –which can be edited and resubmitted

Copy & Move

  • cp lets you copy a file from any directory to any other directory, or create a copy of a file with a new name in one directory

    • cp filename.ext newfilename.ext

    • cp filename.ext subdir/newname.ext

    • cp /u/jdoe01/filename.ext ./subdir/newfilename.ext
  • mv allows you to move files to other directories, but it is also used to rename files.

    • Filename and directory syntax for mv is exactly the same as for the cp command.

        mv filename.ext subdir/newfilename.ext
    • NOTE: When you use mv to move a file into another directory, the current file is deleted.

Delete

  • Use the command rm (remove) to delete files

  • There is no way to undo this command!!!

    • We have set the server to ask if you really want to remove each file before it is deleted.

    • You must answer “Y” or else the file is not deleted.

      > ls
      af151074.gb_pr5 test.seq
      > rm test.seq
      rm: remove test.seq? y
      > ls
      af151074.gb_pr5


Text Files

  • Most bioinformatics work involves text files: sequence data, software, scripts, configuration files, analysis results

  • To read, write, and edit these text files you must get familiar with a Text Editor program

  • for this tutorial, use the emacs editor

    • it is pre‐installed on all Linux systems

    • fairly easy to learn/use
    • has some power features (copy/paste, search)

Exercise: Working with Files and Directories

‐bash‐3.2$ mkdir project1 (Make Directory)
‐bash‐3.2$ cd project1 (Change Directory)
‐bash‐3.2$ pwd
/home/efstae01/project1
‐bash‐3.2$ cp /data/tutorial/chr19.fastq .
‐bash‐3.2$ ls –la
‐bash‐3.2$ head chr19.fastq
@HWUSI‐EAS610_0001:3:1:4:1405#0/1
GATAGTTCAATTCCAGAGATCAGAGAGAGGTGAGTG
+
B;30;<4@7/5@=?5?7?1>A2?0<6?<<80>79##
@HWUSI‐EAS610_0001:3:1:5:1490#0/1
GGGCTGGTGGAGTGATCCCAAGGGGTGGGGATGGGG
+
B@A?AAA1BB;A5B44>AA3’@AB>+>@AB94A?A?
@HWUSI‐EAS610_0001:3:1:6:388#0/1
CAGAGTTCATGAAATAGGCCTCTAGTCTTCCTAGAC

emacs File Editor

  • To start emacs, at the command prompt, just type: emacs

  • To use Emacs to edit a file, type:
    emacs filename
    (where filename is the name of your file)

  • When emacs is launched, it opens either a blank text window or a window containing the text of an existing file.

Cut, Copy, and Paste

  • You can delete or move blocks of text.

    • First move the cursor to the beginning (or end) of the block of text.

    • Then set a mark with: Ctrl‐spacebar

    • Now move to the other end of the block of text and Delete or Copy the block:

    • Delete: Ctrl‐w

    • Copy: [Esc] w
  • To Paste a copied block, move to the new location and insert with : Ctrl‐y

Save & Exit

  • To save a file as you are working on it, type:
    Ctrl‐x »Ctrl‐s

  • To exit emacs and return to the Unix shell, type: Ctrl ‐x »Ctrl ‐c
    If you have made any changes to the file, Emacs will ask you if you want to save:

    Save file /u/browns02/nrdc.msf? (y,n,!,.,q,C-r or C-h)

    • Type “y”to save your changes and exit

    • If you type “n”, then it will ask again:
  • Modified buffers exist; exit anyway? (yes or no)

    • If you answer “no”, then it will return you to the file, you must answer “yes”to exit without saving changes


Getting Help in Emacs

  • Emacs has a built in help feature

    • Just type: Ctrl‐h

    • To get help with a specific command, type: Ctrl‐h k keys
      (where “keys” are the command keys that you type for that command)
  • Emacs has a built in tutorial: Ctrl‐h t

  • this will be an exercise for this week’s computer lab.

Just a Beginning

  • This tutorial is just a minimal beginning for the basic computing skills needed for NGS bioinformatics

  • To get help with a specific command, type: Ctrl‐h k keys
    (where “keys” are the command keys that you type for that command)

  • An excellent and much more detailed tutorial is available free online:

    Unix and Perl Primer for Biologists by Keith Bradnam & Ian Korf,

    http://korflab.ucdavis.edu/Unix_and_Perl unix_and_perl_v3.1.1.html