We optimized the GenePainter web application for Firefox and Google Chrome. The application’s functionality might be reduced in other browsers. To take advantage of the full functionality GenePainter has to offer, please start a recent version of Firefox or Google Chrome and visit this page again.

Command Line Tool

Contents

Usage Back to top

$ ruby gene_painter.rb -i <alignment> -p <yaml_files> [<options>]

-i or --inputPath to fasta-formatted multiple sequence alignment
-p or --pathPath to folder containing gene structures in YAML or GFF format
Standard output formatMark exons by '-' and introns by '|'

Options Back to top

Text-based output format

--intron-phaseMark introns by their phase instead of '|'
--phyloMark exons by '0' and introns by '1'
--spacesMark exons by space (' ') instead of '-'
--no-standard-outputSpecify to skip standard output format.
--alignmentOutput the alignment file with additional lines containing intron phases
--fuzzy NIntrons at most N base pairs apart from each other are aligned

Graphical output format

--svgDraw a graphical representation of genes in SVG format.
--svg-format FORMATSwitch between different formats.
FORMAT must be one of "normal", "reduced" or "both"]
"normal" draws details of aligned exons and introns [default]
"reduced" focuses on common introns only
"both" draws both formats
--pdb FILEMark consensus or merged gene structure in pdb FILE
Consenus gene structure contains introns conserved in N % of all genes
Specify N with option --consensus N; [default: 80%]
Two scripts for execution in PyMol are provided:
'color_exons.py' to mark consensus exons
'color_splicesites.py' to mark splice junctions of consensus exons
--pdb-chain CHAINMark gene structures for chain CHAIN. [default: Use chain A]
--pdb-ref-prot PROTUse protein PROT as reference for alignment with pdb sequence. [default: First protein in alignment]
--pdb-ref-prot-structColor only intron positions occuring in the reference protein structure.
--treeGenerate newick tree file and SVG representation

Meta information and statistics

--consensus NMark all introns conserved in N % genes. Specify N as decimal number between 0 and 1.
--mergeMerge all introns into a single exon intron pattern
--statisticsOutput additional file with statistics about common introns.
To include information about taxomony, specify options --taxomony and ‑‑taxonomy‑to‑fasta.

Taxonomy

--taxonomy FILEUse this option to mark introns by taxonomy.
NCBI taxonomy database dump file FILE
OR Excerpt of NCBI taxonomy. Lineage must be semicolon-separated list of taxa from root to species.
--taxonomy-to-fasta FILEText-based file mapping gene structure file names to species names.
One or more genes given as semicolon-separated list and species name.
Delimiter between gene list and species name must be a colon. The species name itself must be enclosed by double quotes like this "SPECIES"
--taxonomy-common-to X,Y,ZMark introns common to taxa X,Y,Z. List must consist of at least one NCBI taxon (scientific name)
--[no-]exclusively-in-taxaMark introns occuring (not) exclusively in listed taxa.
[default: not exclusively]
--introns-per-taxonMark newly gained introns for every inner node in taxonomy.

Parse NCBI taxonomy

--no-grepRead the NCBI taxomony dump into RAM. This will require some additional hundert MBs of RAM. [default: taxomony dump is parsed with grep calls]
--niceRun grep calls with lower priority. Please make sure to have nice in your executable path when using this option.

Analysis and output of all or subset of data

--analyse-all-output-allAnalyse all data and provide full output [default]
--analyse-all-output-selectionAnalyse all data and provide text-based and graphical output for selection only. All introns are analysed, including those not present in selection
‑‑analyse‑selection-output‑selectionAnalyse selected data and provide output for selection only
‑‑analyse‑selection‑on‑all‑data-output‑selectionAnalyse intron positions of selected data in all data and provide output for selection only. Introns present in selection are analysed in all data

Selection criteria for data and output selection

--select-allNo selection applied (default)
--selection-based-on-regex "REGEX"Regular expression applied on gene structure file names. Regex must be enclosed by double quotes
--selection-based-on-list X,Y,ZList of gene structures to be used
--selection-based-on-species SPECIESUse all gene structures associated with species. Specify also --taxonomy-to-fasta to map gene structure file names to species names

General options

-o or --outfile FILENAMEPrefix of the output files.
--path-to-output PATHPath to the location where output files should be stored.
--range START,STOPRestrict genes to range START-STOP in alignment
--[no-]delete-range(Not) Delete specified range
--keep-common-gapsKeep common gaps in alignment. This option effects only output of --alignment
--no-best-position-intronsPlot introns always onto beginning of a gap.
Default: Align introns if their position differs by alignment gaps only
--[no-]separate-introns-in-textbased-output(Not) Separate each consecutive pair of introns by an exon placeholder in text-based output formats.
Default: Separate introns unless the output lines get too long.
-h or --helpList all options available.

For a complete list of all options available, please refer to the documentation.`

Changes in command line parameters from v.1.0 to v.2.0 Back to top

v.1.0 parameterv.2.0 parameter
-a --alignment
-n --intron-phase
-phylo --phylo
-s --spaces
-svg WIDTH,HEIGHT FORMAT --svg and --svg-format
-start START and -stop STOP --range START,STOP
-pdb --pdb
-pdb_prot --pdb-ref-prot
-ref_prot_struct --pdb-ref-prot-struct
-consensus --consensus, no longer restricted to combination with -pdb
-f and -penalize_endgaps obsolete

link to webscipio
link to diark
link to cymobase
link to motorprotein
link to MPI-BPC