Using Vlad

Overview

Vlad is a tool for visualizing and analyzing GO annotation data for sets of genes. Given a set of annotation data (e.g., the fly gene-to-GO annotations from FlyBase) and the ids of a set of "interesting" genes (such as those differentially expressed in a microarray experiment), Vlad analyzes the data and displays a summary, in graphical and tabular forms, of the GO classifications "most relevant" to the query set. Vlad can be used to look for overrepresentation of processes, functions, and components represented by the query set relative to the database as a whole, or relative to a specified "universe". Vlad can also be used to simply summarize the query set, without comparison to anything.

Running the Sample

Vlad is distributed with sample data, contained in the sampledata subdirectory. After running the installation script, you should be able to run the sample to see the kind of output that Vlad produces. The sampledata directory contains the following files: The Install script generates a parameter file, Sample, for analyzing the sample data. To run the sample, cd to the top level Vlad directory, and type:
	% ./vlad Sample
Vlad will print status messages to your terminal window and will write a set of output files to the sampledata directory. The terminal output should look something like this:
	Loading params...
	Processing params...
	Annotation Loader = org.jax.mgi.app.vlad.GOAnnotationLoader
	Ontology Loader = org.jax.mgi.app.vlad.OboLoader
	Loading annotations...
	    80168 annotations loaded.
	Filtering annotations...
	    80040 annotations remain after filtering.
	Loading ontologies...
	    loaded: process: 8641 nodes, 13883 edges, 1 root.
	    loaded: function: 6902 nodes, 8058 edges, 1 root.
	    loaded: component: 1350 nodes, 1734 edges, 1 root.
	Attaching annots to ontology terms...
	Computing scores...
	Culling...
	Rendering...
	Generating output files...
		/Users/jer/work/java/vlad/sampledata/VLADSAMPLE.html
		/Users/jer/work/java/vlad/sampledata/VLADSAMPLE.process.jpg
		/Users/jer/work/java/vlad/sampledata/VLADSAMPLE.function.jpg
		/Users/jer/work/java/vlad/sampledata/VLADSAMPLE.component.jpg

The pathnames of the output files will vary, depending on where you have Vlad installed. To view the results, open the file VLADSAMPLE.shtml in your Web browser. Your results should look simiar to the sample output included in the documentation. There may be differences due to changes in the data or if you change the parameter settings.

Setting Parameters

Vlad defines a large number of parameters that you can set to control its behavior. (See complete parameter list). Parameters can be set on the command line, in one or more configuration files, or a combination of both.

When invoked, Vlad first reads the special configuration file, Parameters.defaults, located in its installation directory. Parameters that you "set and forget" are put here; edit this file to set global defaults for parameters the way you want.

Next, Vlad considers each command line argument in turn. If an argument is of the form "-name=val", the parameter "name" is defined with value "val". Spaces are not allowed unless you use quotes:

    % vlad -foo = this is an error
    % vlad -foo="   this is OK  "
    % vlad "-foo = this is OK, too"
If the argument is not a parameter definition, it is taken to be the name of a file containing parameter definitions, and its contents are loaded. You can freely mix parameter files and command line definitions. The arguments are processed left-to-right (except that Parameters.defaults is always processed first). If a parameter is defined more than once, the last definition "wins". For example,
    % vlad MyParameters.txt -pThresh=5
To set parameters for the run, Vlad first processes all the definitions in Parameters.defaults, then all the definitions in MyParameters.txt, then the definition "-pThresh=5".

Parameter File Format

Parameter definition files follow the format defined by the standard java class: java.lang.Properties. Basically, it is an ASCII file containing blank lines, comment lines (which begin with '#' or '!') and parameter definition lines. Blank lines and comment lines are ignored. A parameter definition line has the form: " name = value ", which defines a parameter named "name" whose value is "value". The value includes everything to the right of the = sign, minus leading and trailing whitespace.

Here's a sample from the Parameters.defaults file:

	#
	# Which scoring method to use.
	# One of: percents, pvals.
	# Default=pvals
	#
	scoring=pvals

	#
	# Pruning threshold. Nodes whose score falls below this
	# threshold are removed from the dag.
	#
	pThresh = 1

	#
	# Collapsing threshold. Nodes whose score falls below this
	# threshold and above pThresh are drawn small. Above cThresh
	# nodes are drawn "expanded".
	# 
	cThresh = 10
For full details, see the J2SE API Documentation for the class java.lang.Properties.

Vlad Output

By default, the output is in HTML format and contains the results for the three ontologies, (process, function, component); each ontology's results are displayed in both graphical and tabular form. (You can select specific ontologies, and can enable/disable the graphical and tabular outputs.) Each ontology is treated independently, and their results simply concatenated.

Graphical Display

Nodes. Depending on the thresholds set on the query form, the graphical display contains a collection of collpased nodes, drawn as small circles, and expanded nodes, containing text. The collapsed nodes are those that meet the pruning threshold, but not the collapsing threshold, while the expanded nodes meet both thresholds. The exception to this rule is the root node, which is never pruned and is always expanded.
An expanded node displays the GO id, the full name of the GO term, and the computed scores at that node. Collapsed nodes display no text. However, mousing over a collapsed node displays its information on the browser status line.
Clicking on a node jumps to the corresponding row of the tabular diplay, which follows the image.
Scores. If the P-value scoring method was selected, a node's scores are displayed as a triple (P, k, M), where P is the node's P-value, k is the number of genes in the query set annotated* to that node, and M is the number of genes in the database annotated* to that node. If the percentage method is selected, then the scores are displayed as (p, k), where p is the percentage and k is the count of genes in the query set annotated* to that node. (*Or to a descendent of the node.)
Color. Node color is reflective of node score. The darker, more vivid the color, the better the score. However, note that colors are scaled separately for each ontology result; the same intensity in two different images may not necessarily reflect the same scores. By definition, the root node has a P-value of 1, and so will always be colored white (minimum saturation).
Size. The size of an expanded node has no significance other than to contain its label and scores. Collapsed node are all the same size.

Edges. Is-a edges are drawn as green arrows with a hollow tip. Part-of edges are purple, with a solid, diamond-shaped tip.
Abridged edges. It can happen that a node that is pruned from the graph has descendants that are not pruned. In this case the descendants must be reattached with edges that actually represent two or more edges in the underlying ontology. Such edges are called "abridged", and are labelled in the display with "...". If an abridged edge represents all Is-a or all Part-of edges, it is drawn in the same style as described above. If an abridged edge represents a path of both Is-a and Part-of edges, it is drawn in black with a solid triangular tip.

Tabular Display

Below the image, the same node data are redisplayed in tabular form. There is one row per node in the image. The rows are sorted by score, most significant first. Each row contains the same data as displayed in an expanded node: GO ID, term, and scores. The GO ID is a link to the AmiGO browser's detail page for the node. As well, each row lists all the genes in the query set that are annotated to the node. The genes are displayed as gene symbols, and are links to the appropriate database. If a gene has a direct annotation to the node, its symbol is preceeded by a "*"; if the gene only has annotations to descendant nodes, then no asterisk is included.

Output Formats

By default, Vlad output consists of HTML pages with JPEG images. You can change both of these.

Getting Data

Vlad needs two basic sets of data: the GO ontology and a set of GO annotations. You can download the GO and the annotation data contributed by many databases from www.geneontology.org.

Gotchas, limitations and known problems

Credits

Vlad was developed by Joel Richardson, with support from the Mouse Genome Informatics project.

Vlad is similar to the GO Term Finder, developed by Gavin Sherlock at SGD. The P-value scoring formula was lifted from a copy of one of Gavin's talks. Gary Churchill supplied a crucial function for computing sums in log space, overcoming a nasty machine precision problem.

Vlad uses the GraphViz package from AT&T to do the actual graph layout and image rendering.


MGI Home
Vlad Home