Using Vladsampledata
subdirectory. After running the installation script, you should be able to run
the sample to see the kind of output that Vlad produces. The sampledata directory
contains the following files:
gene_ontology.obo, the three GO ontologies, as of April 12, 2005.
gene_association.mgi,
the MGI
mouse gene annotations, as of April 12, 2005.
dnaRepairGenes.mgi, a test set of MGI gene ids, preselected
for being involved in DNA repair. There are also a couple of genes that
are not involved in DNA repair, and there are several duplicate entries.
These are intended to illustrate some Vlad's output messages.
Sample, for analyzing
the sample data. To run the sample, cd to the top level Vlad directory, and type:
% ./vlad SampleVlad will print status messages to your terminal window and will write a set of output files to the sampledata directory. The terminal output should look something like this:
Loading params... Processing params... Annotation Loader = org.jax.mgi.app.vlad.GOAnnotationLoader Ontology Loader = org.jax.mgi.app.vlad.OboLoader Loading annotations... 80168 annotations loaded. Filtering annotations... 80040 annotations remain after filtering. Loading ontologies... loaded: process: 8641 nodes, 13883 edges, 1 root. loaded: function: 6902 nodes, 8058 edges, 1 root. loaded: component: 1350 nodes, 1734 edges, 1 root. Attaching annots to ontology terms... Computing scores... Culling... Rendering... Generating output files... /Users/jer/work/java/vlad/sampledata/VLADSAMPLE.html /Users/jer/work/java/vlad/sampledata/VLADSAMPLE.process.jpg /Users/jer/work/java/vlad/sampledata/VLADSAMPLE.function.jpg /Users/jer/work/java/vlad/sampledata/VLADSAMPLE.component.jpgThe pathnames of the output files will vary, depending on where you have Vlad installed. To view the results, open the file VLADSAMPLE.shtml in your Web browser. Your results should look simiar to the sample output included in the documentation. There may be differences due to changes in the data or if you change the parameter settings.
When invoked, Vlad first reads the special configuration
file, Parameters.defaults, located in its installation directory.
Parameters that you "set and forget" are put here; edit this file
to set global defaults for parameters the way you want.
Next, Vlad considers each command line argument in turn. If an argument is of the form "-name=val", the parameter "name" is defined with value "val". Spaces are not allowed unless you use quotes:
% vlad -foo = this is an error
% vlad -foo=" this is OK "
% vlad "-foo = this is OK, too"
If the argument is not a parameter definition, it is taken to be the
name of a file containing parameter definitions, and its contents are loaded.
You can freely mix parameter files and command line definitions. The
arguments are processed left-to-right (except that Parameters.defaults is always
processed first). If a parameter is defined more than once, the last definition "wins".
For example,
% vlad MyParameters.txt -pThresh=5
To set parameters for the run, Vlad first processes all the definitions
in Parameters.defaults, then all the definitions in
MyParameters.txt, then the definition "-pThresh=5".
java.lang.Properties.
Basically, it is an ASCII file containing
blank lines, comment lines (which begin with '#' or '!') and parameter
definition lines. Blank lines and comment lines are ignored.
A parameter definition line has the form: " name = value ", which
defines a parameter named "name" whose value is "value". The value
includes everything to the right of the = sign, minus leading and
trailing whitespace.
Here's a sample from the Parameters.defaults file:
# # Which scoring method to use. # One of: percents, pvals. # Default=pvals # scoring=pvals # # Pruning threshold. Nodes whose score falls below this # threshold are removed from the dag. # pThresh = 1 # # Collapsing threshold. Nodes whose score falls below this # threshold and above pThresh are drawn small. Above cThresh # nodes are drawn "expanded". # cThresh = 10For full details, see the J2SE API Documentation for the class
java.lang.Properties.
Edges. Is-a edges are drawn as green arrows with a hollow tip. Part-of edges
are purple, with a solid, diamond-shaped tip.
Abridged edges. It can happen that a node that is pruned from the graph has descendants
that are not pruned. In this case the descendants must be reattached with edges that
actually represent two or more edges in the underlying ontology.
Such edges are called "abridged", and
are labelled in the display with "...". If an abridged edge represents all Is-a or all Part-of
edges, it is drawn in the same style as described above. If an abridged edge represents
a path of both Is-a and Part-of edges, it is drawn in black with a solid triangular tip.
outRptFormat to "text" will
cause Vlad to output the tables as tab-delimited
ASCII text files, suitable for downstream proicessing.
There will be one additional output file for each
ontology selected. It will have a ".txt" extension, e.g.,
VLAD42541.process.txt.
In addition, the "Genes" column of the tables will
contain IDs, rather than symbols. Finally, the "top level" file will
be a text report of the run instead of an HTML page. It will
also have a ".txt" extension, e.g., VLAD42541.txt.
outImgFormat
to any value understood by Graphviz. Other common values are: gif, png,
and ps. See Graphviz documentation for details.
Vlad is similar to the GO Term Finder, developed by Gavin Sherlock at SGD. The P-value scoring formula was lifted from a copy of one of Gavin's talks. Gary Churchill supplied a crucial function for computing sums in log space, overcoming a nasty machine precision problem.
Vlad uses the GraphViz package from AT&T to do the actual graph layout and image rendering.