GATA (a Graphic Alignment Tool for Comparative Sequence Analysis)
GATA is a Java application that graphically aligns DNA sequences using the
NCBI bl2seq/ BLASTN program.
To use GATA, you may need to download and uncompress the blast package (e.g. blast-2.2.6-powerpc-macosx.tar.gz for macOSX) from
NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST/).
To align DNA sequences with GATA, first launch GATAliner to create and save the alignment objects.
Then launch GATAPlotter to display the alignment. GATAliner will prompt you for the following parameters.
All fields must be completed.
File Parameters:
- Reference Sequence - a FASTA formated DNA sequence file representing the top sequence in a GATA plot (e.g. /Users/nix/seqs/Eve_Mel.fasta).
- Comparative Sequence - a FASTA formated DNA sequence file representing the bottom sequence in a GATA plot (e.g. /Users/nix/seqs/Eve_Pse.fasta).
- bl2seq BLAST Program - the bl2seq program downloaded from NCBI (e.g. /seqprg/Blast/bl2seq).
- Select a folder for saving the alignment objects - location where you want to save the alignment objects (e.g. /Users/nix/Desktop/).
- Base name for the alignment objects - a unique name to be used to label the alignment objects (e.g. EveMelanoToPsuedo).
BLASTN Parameters:
- Nt: Match - score assigned to a nucleotide match in an alignment.
- Nt: MisMatch - punitive score assigned to a nucleotide mismatch in an alignment.
- Gap: Creation - punitive score assigned for creating a gap in an alignment.
- Gap: Extension - punitive score assigned for each successive nt gap extension following the initial gap opening.
- Mask low complexity regions - an option for the BLASTN program to ignore low complexity regions when constructing the local alignments.
GATAligner Parameters:
- Window Size - the size of a sliding window that is used to score and potentially eliminate "sub alignments" from the local alignments returned by BLASTN.
- Lower Cut Off Score - Sub alignments falling below this score are eliminated. It is important to set this score relatively high to minimize th number of extraneous sub alignment objects. Sub alignments falling below 20 bits for a 24bp window are mostly noise. Use the sliders in GATAPlotter to get a feel for what minimum score is appropriate for a given window size and set this as the lower cut off score when running GATAligner.
- Start Position Reference Sequence - the first base position in the reference sequence, it is needed to keep in register with any GFF annotation you provide.
- Start Position Comparative Sequence - the first base position in the comparative sequence.
For questions, comments, suggestions, or bug reports contact David Nix (nix@berkeley.edu) or the
Eisen Lab.