RegulatorTrail 1.1
A web service for the identification of key transcriptional regulators

Input data
RegulatorTrail is able to read various input file formats through which the user can provide measurement data , genomic regions or RTIs that should be analyzed. In general, RegulatorTrail will try to automatically detect the meta-data of the uploaded data. This means it attempts to detect the used data format, identifier type, and organism the data was derived from. If errors arise during this step, it is important to understand which input types are supported by RegulatorTrail.
Thus, in the following we discuss the expected input formats and the assumptions bc RegulatorTrail makes about their contents.
nameof such an entity as it is used in some database such as Ensembl, UniProt, or NCBI Gene.
Identifier lists
The simplest way to provide input data to RegulatorTrail is to upload a list of identifiers. Identifier lists can contain both: a, typically short, list ofrelevantentities or a, typically long, list of entities sorted by relevance.
GDA SCN3A SCN3B RPLP2 GFER SNORA68 SNORA65 PIP5KL1 BTBD1 RPLP0 BTBD2 BTBD3 ...
Identifier level scores
Similarly to identifier lists, score lists can be provided in a text based format containing one identifier per
line. The difference to identifier lists is that a score, a numerical value measuring the relevance
of the
entity, is provided in an additional column. Both columns are separated by a whitespace, preferably by a tab
character.
GDA 0.05501 SCN3A -0.017374 SCN3B 0.33427200000000046 RPLP2 -0.10048799999999997 GFER 0.08075766666666603 SNORA68 0.2532145 SNORA65 -0.289492 PIP5KL1 0.267125 BTBD1 -0.824291000000001 RPLP0 0.050174750000000046 BTBD2 -0.424771999999999 BTBD3 0.267594 RPLP1 -0.1359804999999995 ATP6 -0.2206155 ...
Measurements
RegulatorTrail provides support for directly analyzing matrices containing high-throughput measurements. These can be normalized expression values obtained from microarray or RNA-seq experiments or protein abundances from mass-spectrometry runs. Additionally we offer rudimentary support for analyzing count data obtained via RNA-seq.
Measurements can be uploaded as a plain text, tab-separated matrix. Optionally, the first column of the file contains names for each of the contained samples. Each subsequent row contains the measurement data for one identifier in all samples. Thus each row except the first starts with an identifier followed by N numerical values, where N is the number of samples.
Sample1 Sample2 Sample3 GeneA 0.1 4.3 2.3 GeneB 3.2 -1.2 1.1 GeneC 2.7 9.1 0.3 ...The advantage of uploading matrices of measurements is, that sample-based (sometimes called phenotype-based) permutation schemes can be used to determine p-values.
Microarray data
A major use case of RegulatorTrail is the analysis of microarray data. For this experimental platform, well established normalization pipelines exist that usually generate normal or log-normal distributed expression values. RegulatorTrail can directly work with this kind of data and offers a range of statistics that can be used to derive scores from expression matrices.RNA-seq data
RNA-seq data usually comes in the form of count data. This means, that for each transcript and sample the number of reads that were mapped to the transcript is reported. The distribution of this data is fundamentally different to the distribution of microarray data, and hence new methods for the analysis of count data have been developed. RegulatorTrail offers some basic support for directly analyzing count data. For this purpose it uses the DESeq2 [2], edgeR [3], and RUVSeq [4] R packages that can be used to compute scores from count data.
Note that currently for count data, no sample-based permutations can be performed due to the prohibitive runtime of the score computation process.
Others
Data from other experimental platforms can also be used in RegulatorTrail. Here, however, it is up to the user to select an appropriate scoring scheme.BED files
Open-chromatin regions or histone marks, needed for an INVOKE analysis, can be uploaded in BED file format. In this format every line represents a region of interest. Each individual line contains at least three fields.
- Chromosome
- Start position of the region
- End position of the region
chr1 180775 180925 chr1 181395 181545 chr1 273895 274045 chr1 629895 630045 chr1 633855 634005 ...
An additional description of the format can be found here.
RTI file format
While RegulatorTrail already offers a large collection of RTIs, it can be desirable to upload custom data that is not yet included. For this purpose users can upload their own RTIs in a tab-delimited format. In this format every line represents a single RTI. The first column corresponds to the name of the regulator and the second column to the respective target gene.
E2F5 PSMA2P1 E2F5 ZNF879 E2F5 OSMR-AS1 E2F5 CAMK1 E2F5 SPR E2F5 ZNF700 E2F5 ZNF707 E2F5 CAMK4 E2F5 OR8A3P E2F5 EDEM2 E2F5 ZC3H10 E2F5 RNF114 E2F5 ZC3H15
Troubleshooting
RegulatorTrail does not recognize my score list exported from Excel
MS Excel is a popular tool for managing biological datasets. However, there are some pitfalls especially when it
comes to interoperability with other tools. It can happen that Excel reformats gene identifiers as dates. For
example the gene Apr1
is routinely recognized as April the first. Please make sure, that no such
conversions have taken place before exporting your data from Excel.
For more information see also Zeeberg et al. [5].
Bibliography
- Global functional profiling of gene expression Genomics Elsevier (View online)
- Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol (View online)
- edgeR: a Bioconductor package for differential expression analysis of digital gene expression data Bioinformatics Oxford Univ Press (View online)
- Normalization of RNA-seq data using factor analysis of control genes or samples Nature biotechnology Nature Publishing Group (View online)
- Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics BMC bioinformatics BioMed Central Ltd (View online)