We optimized the GenePainter web application for Firefox and Google Chrome. The application’s functionality might be reduced in other browsers. To take advantage of the full functionality GenePainter has to offer, please start a recent version of Firefox or Google Chrome and visit this page again.
Command Line Tool
Contents
Usage | Back to top |
$ ruby gene_painter.rb -i <alignment> -p <yaml_files> [<options>]
-i or --input | Path to fasta-formatted multiple sequence alignment |
-p or --path | Path to folder containing gene structures in YAML or GFF format |
Standard output format | Mark exons by '-' and introns by '|' |
Options | Back to top |
Text-based output format
--intron-phase | Mark introns by their phase instead of '|' |
--phylo | Mark exons by '0' and introns by '1' |
--spaces | Mark exons by space (' ') instead of '-' |
--no-standard-output | Specify to skip standard output format. |
--alignment | Output the alignment file with additional lines containing intron phases |
--fuzzy N | Introns at most N base pairs apart from each other are aligned |
Graphical output format
--svg | Draw a graphical representation of genes in SVG format. |
--svg-format FORMAT | Switch between different formats. FORMAT must be one of "normal", "reduced" or "both"] "normal" draws details of aligned exons and introns [default] "reduced" focuses on common introns only "both" draws both formats |
--pdb FILE | Mark consensus or merged gene structure in pdb FILE Consenus gene structure contains introns conserved in N % of all genes Specify N with option --consensus N; [default: 80%] Two scripts for execution in PyMol are provided: 'color_exons.py' to mark consensus exons 'color_splicesites.py' to mark splice junctions of consensus exons |
--pdb-chain CHAIN | Mark gene structures for chain CHAIN. [default: Use chain A] |
--pdb-ref-prot PROT | Use protein PROT as reference for alignment with pdb sequence. [default: First protein in alignment] |
--pdb-ref-prot-struct | Color only intron positions occuring in the reference protein structure. |
--tree | Generate newick tree file and SVG representation |
Meta information and statistics
--consensus N | Mark all introns conserved in N % genes. Specify N as decimal number between 0 and 1. |
--merge | Merge all introns into a single exon intron pattern |
--statistics | Output additional file with statistics about common introns. To include information about taxomony, specify options --taxomony and ‑‑taxonomy‑to‑fasta. |
Taxonomy
--taxonomy FILE | Use this option to mark introns by taxonomy. NCBI taxonomy database dump file FILE OR Excerpt of NCBI taxonomy. Lineage must be semicolon-separated list of taxa from root to species. |
--taxonomy-to-fasta FILE | Text-based file mapping gene structure file names to species names. One or more genes given as semicolon-separated list and species name. Delimiter between gene list and species name must be a colon. The species name itself must be enclosed by double quotes like this "SPECIES" |
--taxonomy-common-to X,Y,Z | Mark introns common to taxa X,Y,Z. List must consist of at least one NCBI taxon (scientific name) |
--[no-]exclusively-in-taxa | Mark introns occuring (not) exclusively in listed taxa. [default: not exclusively] |
--introns-per-taxon | Mark newly gained introns for every inner node in taxonomy. |
Parse NCBI taxonomy
--no-grep | Read the NCBI taxomony dump into RAM. This will require some additional hundert MBs of RAM. [default: taxomony dump is parsed with grep calls] |
--nice | Run grep calls with lower priority. Please make sure to have nice in your executable path when using this option. |
Analysis and output of all or subset of data
--analyse-all-output-all | Analyse all data and provide full output [default] |
--analyse-all-output-selection | Analyse all data and provide text-based and graphical output for selection only. All introns are analysed, including those not present in selection |
‑‑analyse‑selection-output‑selection | Analyse selected data and provide output for selection only |
‑‑analyse‑selection‑on‑all‑data-output‑selection | Analyse intron positions of selected data in all data and provide output for selection only. Introns present in selection are analysed in all data |
Selection criteria for data and output selection
--select-all | No selection applied (default) |
--selection-based-on-regex "REGEX" | Regular expression applied on gene structure file names. Regex must be enclosed by double quotes |
--selection-based-on-list X,Y,Z | List of gene structures to be used |
--selection-based-on-species SPECIES | Use all gene structures associated with species. Specify also --taxonomy-to-fasta to map gene structure file names to species names |
General options
-o or --outfile FILENAME | Prefix of the output files. |
--path-to-output PATH | Path to the location where output files should be stored. |
--range START,STOP | Restrict genes to range START-STOP in alignment |
--[no-]delete-range | (Not) Delete specified range |
--keep-common-gaps | Keep common gaps in alignment. This option effects only output of --alignment |
--no-best-position-introns | Plot introns always onto beginning of a gap. Default: Align introns if their position differs by alignment gaps only |
--[no-]separate-introns-in-textbased-output | (Not) Separate each consecutive pair of introns by an exon placeholder in text-based output formats. Default: Separate introns unless the output lines get too long. |
-h or --help | List all options available. |
Changes in command line parameters from v.1.0 to v.2.0 | Back to top |
v.1.0 parameter | v.2.0 parameter |
-a | --alignment |
-n | --intron-phase |
-phylo | --phylo |
-s | --spaces |
-svg WIDTH,HEIGHT FORMAT | --svg and --svg-format |
-start START and -stop STOP | --range START,STOP |
-pdb | --pdb |
-pdb_prot | --pdb-ref-prot |
-ref_prot_struct | --pdb-ref-prot-struct |
-consensus | --consensus, no longer restricted to combination with -pdb |
-f and -penalize_endgaps | obsolete |