GATA (Graphic Alignment Tool for Comparative Sequence Analysis)


The GFF file format (http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml) is merely a suggestion on how to store biological annotation. GATA uses a BDGP Drosophila GFF type file format. Use the enclosed Excel (gffTemplate.xls) or tab (gffTemplate.txt) delimited templates to describe your gene annotation. Then save the template as a tab delimited text file and import it into GATAPlotter. Use the following information and enclosed example files (Excel (gffExamples.xls), tab (gffExamples.txt)) as a guide.

A tab must separate each column in the gff file.
## Denotes comment lines and are ignored by GATA.
Follow the examples.

Column headings:

Chromosome/SeqName Only used by GATA as name for the feature when a "name=value;" attribute is missing. Otherwise it is ignored.
Source Not used by GATA.
Feature Recognized features are incorporated into gene group annotation and drawn as a unit, independent of other gene groups. These gene groups typically contain one or more transgroups/splice forms (exons/ translation /transcription) and a gene (DNA). Recognized features include: "exon", "transcript", "translation", "*gene*", "*rna*", "*transpos*", "*misc*". Where an "*" represents one or more wildcards. For example GATA will recognize "trna", "snRNA", "rRNA", "misc. non-coding RNA", etc. Case insensitive. The order of elements within a gene group is important. List exons first, then one translation (if present), then one transcript. Do this for each transgroup and finally list the closing feature, typically just a "gene" or "*rna*", "*transpos*", "*misc*". Follow the examples.

Features not recognized by GATA are assigned to individual tracks. The Feature column is a good place to put hits to transcription factor binding motifs (e.g. "HunchBack") or cis regulatory modules (e.g. CRM). Use this feature column for putting in a generic name, not a specific name (e.g. bad -> CRM1, CRM27, etc) since each unrecognized feature will be assigned a separate track.

Start Where a feature starts, an integer (e.g. given gggggATGCATTAGccccc, the translation would start at A, base 6). Start is always less than End.
End Where the feature ends, an integer, inclusive (e.g. the translation above would end on G, base 14). End is always greater than Start.
Score This number is not used in painting recognized features like exons, but can be used to shade unrecognized feature tracks such as hits to transcription factor binding motifs.
Strand Used to orientate the arrows associated with annotation features where "+" denotes left to right, "-" right to left.
Frame Not used by GATA.
Attributes/ Comments Must contain at least one "key=value; " pair. This is a good place to put a unique identifier like "name=CG12445;" or "name=HunchBack hit 27;". If present, the name=value will be extracted by GATA and displayed as the features label. A ";" should be used to close the value parameter.


For questions, comments, suggestions, or bug reports contact David Nix (nix@berkeley.edu) or the Eisen Lab.