©1996-2010 All Rights Reserved. Online Journal of Bioinformatics. You may not store these pages in any form except for your own personal use. All other usage or distribution is illegal under international copyright treaties. Permission to use any of these pages in any other way besides the before mentioned must be gained in writing from the publisher. This article is exclusively copyrighted in its entirety to OJVR publications. This article may be copied once but may not be, reproduced or re-transmitted without the express permission of the editors.


OJBTM

Online Journal of Bioinformatics

Volume 1: 51-61*, 2002


A software package for drawing ideograms automatically

 

Stefan Böhringer*1, René Gödde1, Daniel Böhringer2,

Thorsten Schulte1, Jörg T Epplen1

 

Molecular Human Genetics, Ruhr-Universität Bochum, Germany, 1Department of molecular human genetics, Ruhr-University Bochum, Germany, 2Department of ophthalmology, University clinics, Düsseldorf, Germany *corresponding author: stefan.boehringer@ruhr-uni-bochum.de, Phone: +49 234 3228101, Fax: +49 234 32 14196 Molecular Human Genetics, 44801 Bochum, Universitätsstr. 150, Germany


ABSTRACT

 

Böhringer S, Gödde R, Böhringer D, Schulte T, Epplen JT., A software package for drawing ideograms automatically, Online J Bioinformatics 1: 51-61, 2002. The advent of genome wide linkage disequilibrium scanning requires software for conveniently visualizing results from these screens. Such a software is presented in this paper. The software allows output of chromosomal ideograms and it can highlight arbitrary marker positions. Labels can be attached at liberty which are positioned automatically to avoid overlap. In order to automatize complex visualization tasks, a tool for extracting named marker positions from the location database (ldb) is provided. The software is highly configurable and can draw arbitrary karyograms, banding patterns and chromosome groupings. Annotations may be customized by using program options or by implementing new annotation subclasses. Outputs are in Postscript format which may be converted to other graphic file formats or may be directly used for high quality printing.

 

KEYWORDS: ideogram, automation, genome, annotation, karyogram, chromosome

 

 

 

 

 

INTRODUCTION

 

Modern genetic analyses of complex human disorders continue to expand marker density in the genome (c.f. e.g. 1). Modern genome projects have to visualize information for thousands of markers which is a challenging but indispensable task. Data are to be outlined comprehensively and the representation should highlight important results clearly. We have developed a software package to fulfill these demands, which, at the same time, is kept as user friendly as possible. Although this software package was developed with aforementioned applications in mind it can be used for many different applications whenever chromosomal ideograms are to be drawn. The layout of individual chromosomes is separated from the placement of chromosomes on a sheet, therefore e.g. karyograms underlying chromosomal disorders can be illustrated easily as well.

 

METHODS

 

The software package is implemented in the Perl5 language (9). The information of the following configuration files are read and then combined into the program output:

 

-       the banding pattern and the placement of chromosomes on the sheet;

-       optionally, information on labels which are to be placed alongside the

             chromosomes;

-       optionally, positions of labels are read if their position is given by name

             rather than by physical chromosomal position.

 

The program output is a Postscript file which displays chromosomes as specified by the configuration files. All configuration files are in so-called property list format, which was invented by AppleTM (1). This format can be easily parsed by computer programs and can be conveniently edited by hand in any text editor. Online documentation of the file format is available from the web site corresponding to this project (2). Also a Perl5 module which can read and write property list files is provided and can be used to convert other data sources into property list files. In the following, we describe selected parts of how to configure the program output. Since there is a plethora of options, we refer to the online documentation for a complete list and the exact syntax of options. In its simplest form, the program (coloredChromosomes.pl) is invoked without parameters and will then read configuration files from default locations and place its output in the temporary directory ("/tmp"). Figure 1 displays the default output. In this figure some important margins are highlighted by double arrows. These margins can be defined in the configuration file (arrows with letters A - E). By default a human ideogram is drawn. All chromosomes are placed in lanes and grouped therein. Within a single lane, chromosomes are aligned at their centromeres. Chromosomes are stretched vertically to optimally fill the remaining space after subtracting all vertical margins.

 

 

 

                                                           |------A------|                     |------A------|---B---

              |                 |C|                     F

                                                                                                                     |-----D----|-E-|
 

Figure 1: Ideogram output as produced by default parameters. Double arrows and delimiters indicate spaces that can be assessed in the configuration file. A: inner group spacing, B: additional between group spacing, C: chromosome width, D: left margin, E: bottom margin, F: top margin. These images can be enlarged on your own screen


A refinement of this placement structure is shown in Figure 2. This representation illustrates the concept of subgroups, which can be used to build up two levels of groupings within a single lane. This option can be used to display diploid karyograms or to employ complex annotations, respectively.

 

 

Figure 2: Example of an ideogram using subgroups. Pairs of chromosomes are drawn (subgroup) and grouped together (chromosomes 1-3, 4-5, 6-12 etc.). Band names are drawn to the left of the chromosome pairs. The concepts behind the drawing mechanisms are explained in the text. These images can be enlarged on your own screen (save, edit mode of composer, frontpage or any web editor)


Conceptually, annotations are separated into internal and external annotations. Internal annotations are drawn within the shape of a single chromosome, whereas external annotations show up alongside the shape of a chromosome. The main program only draws the shapes and names of the chromosomes. All further designations are contributed by annotation modules. For example, in Fig. 2 the left chromosome of each pair is annotated with the banding module which draws a banding pattern inside the chromosomal shape. In contrast each right partner chromosome is internally painted by the plain module, simply drawing a plain colour. Figure 3 also displays an external annotation. The banding names to the left of chromosome pairs is a such. The corresponding module bandingNames takes into account band sizes to inset the position of certain names to avoid overlap. The source code of these modules can be used to derive new modules. Further examples of annotation are given in the next section. Program options are documented in detail in the online documentation. 

 



Figure 3: Annotations to visualize the distribution of a set of markers over the human genome.These images can be enlarged on your own screen (save, edit mode of composer, frontpage or any web editor)

 

A second aspect of genomic annotation is the combination of data from different sources. We have developed a program to retrieve the localisation of arbitrary markers in the genome from the location database (ldb; 7). The whole database can be downloaded via FTP (file transfer protocol) and stored on local disk. The invocation of our program (lociLocations.pl) then seeks for loci given in a text file or via standard input and produces a property list file which maps these loci names to chromosomal locations. Aliases for locus names can be given in an additional file. This program calculates the distribution of chromosomal distances between the loci for which the chromosomal locations are to be resolved. This option can be used to estimate the uniformness of genome saturation for a given marker set. Again, more details of program options and the format of input and output files are given in the online documentation (2). In the following several examples are described how the program may be applied.

 

Examples of program applications: Figure 3 shows an example of label annotations. Some candidate genes for a common disease are shown (here: Multiple Sclerosis). A small rectangle is drawn inside the chromosome and a label connected with the rectangle by a line is drawn outside the chromosome. Two modules, etags (external tags), and itags (internal tags) drive this drawing, respectively. If two labels would overlap by direct side by side placement, they are moved and a beziér curve connects rectangle and label. The algorithm used to decide about relocations of labels minimises the total amount of spatial transfer when a new label is introduced. It therefore clearly depends on the order of label placement but shows excellent results in practice. Figure 4 shows an example of complex annotations (see below for explanations).  


Figure 4: Large scale annotation as resulting from a whole genome screen searching for genetic association with MS. The locus names have been renamed and positions are moved by random uniform noise of 5% of the chromosomal length honoring a non disclosure agreement in the GAMES (5, 10) collaboration. These images can be enlarged on your own screen (save, edit mode of composer, frontpage or any web editor)


We have used the program for a genome screen in Multiple Sclerosis involving about 6000 microsatellite markers interspersed in the whole genome. The positions of these markers were extracted with the program described above (lociLocations.pl). Five loci could not be found in the ldb database and were located "manually" using the NCBI database and added to the location file using a text editor. All participants of the collaborative GAMES project can use this location file, so that only p-values for individual markers are needed to graphically represent their results. In our case statistical testing could be carried out for 4666 markers resulting from a case/control design. P-values for each marker were used for chromosomal annotations. Note however, that marker names are random and positions are moved randomly (by a uniform noise of 5% of the chromosomal length) because of a non-disclosure agreement in the GAMES collaboration (5). A subgroup configuration is chosen, showing on the left hand side the banding pattern on the chromosome and giving annotations on a chromosome placed on the right hand side of each pair. Each rectangle drawn inside the chromosome is colour coded and represents a p-value for the marker in that particular position. Values < 0.05 (significant results) are highlighted in yellow, values ³ 0.05 are displayed in green, red and blue. These colours are blended smoothly. The points of interpolation and colours used can be chosen at will in the configuration file holding label information. Labels are sorted to draw small values last to give significant results priority in case of rectangle overlap. Another striking feature is, that not all rectangles have labels attached. A threshold value can be defined to determine by the value attached to the label, whether the label is actually to be drawn (in this case all values < 0.05 have labels attached). The ideogram in Figure 4 shows that a complex data set can be visualized in a single diagram, which is a concise summary of the results.

 

Sofware installation and data conversion: The software has been tested under Linux and Windows 2000 but should work on any platform with a Perl5 installation. The Perl5 website (9) lists supported platforms. The Postscript output can be directly printed (Ghostscript offers Postscript filters for almost any printer; 3). If a bitmap representation is required, tools like Gimp (6) or ImageMagick (4) can be used. The bitmaps of this publication were produced by using Gimp. Note, however, that both, Gimp and ImageMagick require Ghostscript for Postscript import. All mentioned tools are available for most Unix platforms and Windows and the software is free.

 

CONCLUSIONS

 

The ideogram drawing software has automated a large scale ideogram drawing project (5) without any need for manual postprocessing. Also the software has applications in education, a broad range of genetic presentations and internet visualisation of ideograms for arbitrary species. The package can be fully customised and is designed for easy extendibility (c.f. online documentation). Full source code is provided to allow flexible user customisation. It can be downloaded from our web page (2).

 

ACKNOWLEDGEMENTS

 

Stefan Böhringer is supported by a grant from the Heinrich und Alma Vogelsang foundation.

 

 

REFERENCES

 

1. APPLE. Apple computer Inc. http://developer.apple.com/techpubs/macosx/Cocoa/CocoaTopics.html
2.
The coloredChromosomes.pl project. http://mhg.uni-bochum.de/cc
3.
Ghostscript and Ghostview. http://www.cs.wisc.edu/~ghost
4.
GIMP. The GNU Image Manipulation Program. http://www.gimp.org/
5.
GAMES. Genetic analysis of Multiple Sclerosis in europeans. http://www.mrc-bsu.cam.ac.uk/MSgenetics/GAMES
6.
IMAGE. Image conversion. http://www.imagemagick.org
7.
LDB. Location data base. http://cedar.genetics.soton.ac.uk/public_html/ldb.html
8.
NCBI. National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov/
9.
PERL. Practical extraction and report language. http://www.perl.org
10
Sawcer et al. (2002). Brain in press.



©1996-2009 All Rights Reserved. Online Journal of Bioinformatics. You may not store these pages in any form except for your own personal use. All other usage or distribution is illegal under international copyright treaties. Permission to use any of these pages in any other way besides the before mentioned must be gained in writing from the publisher. This article is exclusively copyrighted in its entirety to OJVR publications. This article may be copied once but may not be, reproduced or re-transmitted without the express permission of the editors.


*Reformatted   3/7/09 now to page 58