©1996-2010 All
Rights Reserved. Online Journal of Bioinformatics. You may not store these
pages in any form except for your own personal use. All other usage or
distribution is illegal under international copyright treaties. Permission to
use any of these pages in any other way besides the before mentioned must be
gained in writing from the publisher. This article is exclusively copyrighted
in its entirety to OJVR publications. This article may be copied once but may
not be, reproduced or re-transmitted without the express permission of the
editors.
OJBTM
Online Journal of Bioinformatics
Volume 1: 51-61*, 2002
A software package for drawing ideograms
automatically
Stefan
Böhringer*1, René Gödde1, Daniel Böhringer2,
Thorsten
Schulte1, Jörg T Epplen1
Molecular Human Genetics, Ruhr-Universität Bochum, Germany, 1Department
of molecular human genetics, Ruhr-University Bochum, Germany, 2Department
of ophthalmology, University clinics, Düsseldorf, Germany *corresponding
author: stefan.boehringer@ruhr-uni-bochum.de, Phone: +49 234 3228101, Fax: +49
234 32 14196 Molecular Human Genetics, 44801 Bochum, Universitätsstr. 150,
ABSTRACT
Böhringer S, Gödde R, Böhringer D, Schulte T, Epplen
JT., A software package for drawing ideograms automatically, Online J
Bioinformatics 1: 51-61,
2002. The advent of genome
wide linkage disequilibrium scanning requires software for conveniently
visualizing results from these screens. Such a software is presented in this
paper. The software allows output of chromosomal ideograms and it can highlight
arbitrary marker positions. Labels can be attached at liberty which are
positioned automatically to avoid overlap. In order to automatize complex
visualization tasks, a tool for extracting named marker positions from the location
database (ldb) is provided. The software is highly configurable and can draw
arbitrary karyograms, banding patterns and chromosome groupings. Annotations
may be customized by using program options or by implementing new annotation
subclasses. Outputs are in Postscript
format which may be converted to other graphic file formats or may be directly
used for high quality printing.
KEYWORDS: ideogram, automation, genome, annotation,
karyogram, chromosome
INTRODUCTION
Modern genetic analyses of complex human disorders
continue to expand marker density in the genome (c.f. e.g.
1). Modern genome projects have to visualize information for
thousands of markers which is a challenging but indispensable task. Data are to
be outlined comprehensively and the representation should highlight important
results clearly. We have developed a software package to fulfill these demands,
which, at the same time, is kept as user friendly as possible. Although this
software package was developed with aforementioned applications in mind it can
be used for many different applications whenever chromosomal ideograms are to
be drawn. The layout of individual chromosomes is separated from the placement
of chromosomes on a sheet, therefore e.g.
karyograms underlying chromosomal disorders can be illustrated easily as well.
METHODS
The software package is implemented in the Perl5 language (9). The
information of the following configuration files are read and then combined
into the program output:
- the banding pattern and the placement of chromosomes
on the sheet;
- optionally, information on labels which are to be
placed alongside the
chromosomes;
- optionally, positions of labels are read if their
position is given by name
rather
than by physical chromosomal position.
The program output is a Postscript file which
displays chromosomes as specified by the configuration files. All configuration
files are in so-called property list
format, which was invented by AppleTM
(1). This
format can be easily parsed by computer programs and can be conveniently edited
by hand in any text editor. Online documentation of the file format is
available from the web site corresponding to this project (2). Also a Perl5 module which can read and write
property list files is provided
and can be used to convert other data sources into property list files. In the
following, we describe selected parts of how to configure the program output.
Since there is a plethora of options, we refer to the online documentation for
a complete list and the exact syntax of options. In its simplest form, the
program (coloredChromosomes.pl)
is invoked without parameters and will then read configuration files from
default locations and place its output in the temporary directory
("/tmp").
Figure 1 displays the default output. In this figure some important
margins are highlighted by double arrows. These margins can be defined in the
configuration file (arrows with letters A - E). By default a human ideogram is
drawn. All chromosomes are placed in lanes and grouped therein. Within a single
lane, chromosomes are aligned at their centromeres. Chromosomes are stretched
vertically to optimally fill the remaining space after subtracting all vertical
margins.
|------A------|
|------A------|---B---
|
|C|
F

|-----D----|-E-|
Figure 1: Ideogram output as produced by
default parameters. Double arrows and delimiters indicate spaces that can be
assessed in the configuration file. A: inner group spacing, B: additional
between group spacing, C: chromosome width, D: left margin, E: bottom margin,
F: top margin. These images can be enlarged on your own screen
A refinement of this placement structure is
shown in Figure 2.
This representation illustrates the concept of subgroups, which can be used to
build up two levels of groupings within a single lane. This option can be used
to display diploid karyograms or to employ complex annotations, respectively.

Figure 2: Example of an ideogram using subgroups.
Pairs of chromosomes are drawn (subgroup) and grouped together (chromosomes
1-3, 4-5, 6-12 etc.). Band names are drawn to the left of the chromosome pairs.
The concepts behind the drawing mechanisms are explained in the text. These
images can be enlarged on your own screen (save, edit mode of composer,
frontpage or any web editor)
Conceptually, annotations are separated into
internal and external annotations. Internal annotations are drawn within the
shape of a single chromosome, whereas external annotations show up alongside
the shape of a chromosome. The main program only draws the shapes and names of
the chromosomes. All further designations are contributed by annotation modules.
For example, in Fig. 2 the left chromosome of each pair is annotated with the banding module which draws a banding
pattern inside the chromosomal shape. In contrast each right partner chromosome
is internally painted by the plain
module, simply drawing a plain colour. Figure 3 also displays an external
annotation. The banding names to the left of chromosome pairs is a such. The
corresponding module bandingNames
takes into account band sizes to inset the position of certain names to avoid
overlap. The source code of these modules can be used to derive new modules.
Further examples of annotation are given in the next section. Program options
are documented in detail in the online documentation.

Figure 3: Annotations to
visualize the distribution of a set of markers over the human genome.These
images can be enlarged on your own screen (save, edit mode
of composer, frontpage or any web editor)
A second aspect of genomic annotation is the
combination of data from different sources. We have developed a program to
retrieve the localisation of arbitrary markers in the genome from the location
database (ldb; 7). The whole database can be downloaded via FTP (file transfer protocol) and
stored on local disk. The invocation of our program (lociLocations.pl) then seeks for loci given in a text file or
via standard input and produces a property
list file which maps these loci names to chromosomal locations. Aliases
for locus names can be given in an additional file. This program calculates the
distribution of chromosomal distances between the loci for which the chromosomal
locations are to be resolved. This option can be used to estimate the
uniformness of genome saturation for a given marker set. Again, more details of
program options and the format of input and output files are given in the
online documentation (2). In the following several examples are described how
the program may be applied.
Examples of program
applications: Figure 3
shows an example of label annotations. Some candidate genes for a common
disease are shown (here: Multiple Sclerosis). A small rectangle is drawn inside
the chromosome and a label connected with the rectangle by a line is drawn
outside the chromosome. Two modules, etags
(external tags), and itags
(internal tags) drive this drawing, respectively. If two labels would overlap
by direct side by side placement, they are moved and a beziér curve connects
rectangle and label. The algorithm used to decide about relocations of labels
minimises the total amount of spatial transfer when a new label is introduced.
It therefore clearly depends on the order of label placement but shows
excellent results in practice. Figure 4 shows an example of complex
annotations (see below for explanations).

Figure 4: Large scale annotation as resulting
from a whole genome screen searching for genetic association with MS. The locus
names have been renamed and positions are moved by random uniform noise of 5%
of the chromosomal length honoring a non disclosure agreement in the GAMES (5, 10)
collaboration. These images can be enlarged on your own screen (save, edit mode
of composer, frontpage or any web editor)
We have used the program for a genome screen
in Multiple Sclerosis involving about 6000 microsatellite markers interspersed
in the whole genome. The positions of these markers were extracted with the
program described above (lociLocations.pl).
Five loci could not be found in the ldb
database and were located "manually" using the NCBI database and added to the
location file using a text editor. All participants of the collaborative GAMES
project can use this location file, so that only p-values for individual
markers are needed to graphically represent their results. In our case
statistical testing could be carried out for 4666 markers resulting from a
case/control design. P-values for each marker were used for chromosomal
annotations. Note however, that marker names are random and positions are moved
randomly (by a uniform noise of 5% of the chromosomal length) because of a
non-disclosure agreement in the GAMES collaboration (5). A subgroup configuration is chosen,
showing on the left hand side the banding pattern on the chromosome and giving
annotations on a chromosome placed on the right hand side of each pair. Each
rectangle drawn inside the chromosome is colour coded and represents a p-value
for the marker in that particular position. Values < 0.05 (significant
results) are highlighted in yellow, values ³ 0.05 are displayed in green, red
and blue. These colours are blended smoothly. The points of interpolation and
colours used can be chosen at will in the configuration file holding label
information. Labels are sorted to draw small values last to give significant
results priority in case of rectangle overlap. Another striking feature is,
that not all rectangles have labels attached. A threshold value can be defined
to determine by the value attached to the label, whether the label is actually
to be drawn (in this case all values < 0.05 have labels attached). The
ideogram in Figure
4 shows that a complex data set can be visualized in a single
diagram, which is a concise summary of the results.
Sofware installation and
data conversion: The software has been
tested under Linux and Windows 2000 but should work on any
platform with a Perl5
installation. The Perl5 website
(9) lists
supported platforms. The Postscript
output can be directly printed (Ghostscript
offers Postscript filters for
almost any printer; 3).
If a bitmap representation is required, tools like Gimp (6)
or ImageMagick (4) can be
used. The bitmaps of this publication were produced by using Gimp. Note, however, that both, Gimp and ImageMagick require Ghostscript
for Postscript import. All
mentioned tools are available for most Unix
platforms and Windows and the
software is free.
CONCLUSIONS
The ideogram drawing software has automated a large
scale ideogram drawing project (5) without any need for manual postprocessing.
Also the software has applications in education, a broad range of genetic
presentations and internet visualisation of ideograms for arbitrary species.
The package can be fully customised and is designed for easy extendibility (c.f. online documentation). Full
source code is provided to allow flexible user customisation. It can be
downloaded from our web page (2).
ACKNOWLEDGEMENTS
Stefan Böhringer is supported by a grant from the
Heinrich und Alma Vogelsang foundation.
REFERENCES
1. APPLE. Apple computer Inc. http://developer.apple.com/techpubs/macosx/Cocoa/CocoaTopics.html
2. The coloredChromosomes.pl project. http://mhg.uni-bochum.de/cc
3. Ghostscript and
Ghostview. http://www.cs.wisc.edu/~ghost
4. GIMP. The GNU Image
Manipulation Program. http://www.gimp.org/
5. GAMES. Genetic analysis
of Multiple Sclerosis in europeans. http://www.mrc-bsu.cam.ac.uk/MSgenetics/GAMES
6. IMAGE. Image conversion. http://www.imagemagick.org
7. LDB. Location data base. http://cedar.genetics.soton.ac.uk/public_html/ldb.html
8. NCBI.
9. PERL. Practical
extraction and report language. http://www.perl.org
10 Sawcer et al. (2002).
Brain in press.
©1996-2009 All Rights Reserved. Online Journal of Bioinformatics. You
may not store these pages in any form except for your own personal use. All
other usage or distribution is illegal under international copyright treaties.
Permission to use any of these pages in any other way besides the before
mentioned must be gained in writing from the publisher. This article is
exclusively copyrighted in its entirety to OJVR publications. This article may
be copied once but may not be, reproduced or re-transmitted without the express
permission of the editors.
*Reformatted 3/7/09 now to page 58