Retrieve promoter sequence:

Retrieve the promoter of a gene. A gene is currently identified by the official symbol (e.g. A1BG) or the NCBI Gene database identifier (an integer) or a mRNA accession number (e.g. NM_130786). The transcription start position provided by the UCSC Genome Browser is used as the transcription start site (TSS). Conversion from other identifiers to the NCBI Gene ID can be easily done by the Gene ID coversion tool offered by the DAVID Bioinformatics Resources.

Seven species are currently supported:

Promoters of genes not belonging to one of the species cannot be retrieved.

Enter known TFBSs:

Enter the known binding sites pertinent to a TF in the Fasta format.

Enter matrix:

Enter a precomputed position-specific count matrix in one of the three supported formats:

  1. The JASPAR format
  2. A  [ 34  16   7  58  51   0   2 112 116   0  14  66  13  39  36  25 ]
    C  [ 37  33  51  14   4 116 113   0   0   1  65   6  20  43   9  35 ]
    G  [ 27  26  25  41  56   0   1   1   0   0  33  42  73  22  47  29 ]
    T  [ 18  41  33   3   5   0   0   3   0 115   4   2  10  12  24  27 ]
    
  3. The TRANSFAC format
  4. PO      A      C      G      T
    01     28      1      2      2      A
    02      1      0      0     32      T
    03      1      0     31      1      G
    04      7     25      0      1      C
    05      0     33      0      0      C
    06      0     33      0      0      C
    07     23      0      0     10      A
    08      2      0      0     31      T
    09     33      0      0      0      A
    10      0      0      0     33      T
    11     31      0      0      2      A
    12     10      0      0     23      T
    13      0      0     33      0      G
    14      0      0     33      0      G
    15     12      4      3     14      W
    16     13      9      8      3      N
    17     11      6      7      9      N
    18      1      4      8     20      T
    
  5. The UniPROBE format
  6. Gene:  Aft1-primary  Motif:  A.TGCACCC  Enrichment Score:  0.482942724828625
    A:	0.106206209640922	0.155258373589726	0.362935257239374	0.179981378259842	0.426657762868071	0.13688115038742	0.716619001468936	0.145161072631796	0.0342809276528156	0.160763289170782	0.0109813649457007	0.961083219060667	0.00832020751017246	0.0250020357121778	0.0196753328330524	0.181885409487817	0.265912987550001	0.287381242249375	0.276066006197372	0.123919492224826	0.154128688222547
    C:	0.205920122710712	0.249366061069948	0.19701177393174	0.396922606036913	0.231448649943595	0.0786085508111989	0.0252312407295708	0.0310444550598084	0.0479058505935287	0.0101273271992075	0.96685691379214	0.00920418605632664	0.972061751869787	0.955930572862017	0.740566916117894	0.257711125377008	0.252071568116971	0.155527050151294	0.180275383141046	0.225917310129616	0.236680391255713
    G:	0.231857467992134	0.220654512495749	0.221779465244512	0.222245708297486	0.154409281410371	0.142755096012319	0.0315611879956547	0.288672410229433	0.0392421028440115	0.812386000932422	0.0112396907969206	0.0155411555975395	0.00907985951114708	0.00231660570686073	0.0137829609506435	0.494574967261659	0.241654128057805	0.183562686409529	0.0688601044687691	0.357404663871428	0.458613358343335
    T:	0.456016199656232	0.374721052844577	0.218273503584374	0.200850307405759	0.187484305777963	0.641755202789062	0.226588569805839	0.535122062078962	0.878571118909644	0.0167233826975886	0.010922030465239	0.0141714392854669	0.0105381811088939	0.0167507857189444	0.22597479009841	0.0658284978735163	0.240361316275224	0.373529021189803	0.474798506192812	0.292758533774129	0.150577562178406
    

Sources of matrices:

Matrix-Derived Models:

A model is derived from a count/probability matrix. E.g.

A  [ 34  16   7  58  51   0   2 112 116   0  14  66  13  39  36  25 ]
C  [ 37  33  51  14   4 116 113   0   0   1  65   6  20  43   9  35 ]
G  [ 27  26  25  41  56   0   1   1   0   0  33  42  73  22  47  29 ]
T  [ 18  41  33   3   5   0   0   3   0 115   4   2  10  12  24  27 ]
The positions are assumed to be independent since the information about position dependence is lost from conversion of an TFBS alignment to a matrix.

LASAGNA Models:

The LASAGNA algorithm is first used to align the known (variable-length) TFBSs of a TF. The alignment is then trimmed to build a PSSM model.

Use TRANSFAC TFBSs:

Use precomputed TF models based on TFBSs in the TRANSFAC Public database aligned by the LASAGNA algorithm.

Use ORegAnno TFBSs:

Use precomputed TF models based on TFBSs in the ORegAnno database aligned by the LASAGNA algorithm.

Use PAZAR TFBSs:new

Use precomputed TF models based on TFBSs in the PAZAR database aligned by the LASAGNA-ChIP algorithm.

Use TRANSFAC Matrices:

Use matrices in the TRANSFAC Public database.

Use JASPAR CORE Matrices:

Use matrices in the JASPAR CORE database.

Use UniPROBE Matrices:

Use matrices in the UniPROBE database.

Import results:

Import search results on the same promoter sequences (identified by the headers). As an example, we first search the promoter of yeast gene CTS1 (NCBI Gene ID: 850992) for TFBSs of the 13 TRANSFAC yeast TFs. This is done by selecting "Use TRANSFAC TFBSs" and setting the species to "Saccharomyces cerevisiae (yeast)". We then search the same promoter for TFBSs of ACE2. This is done by selecting "Enter known TFBSs" and pasting the known TFBSs of ACE2 obtained from the ORegAnno database. After the search finishes, we click on "Display results in plain text (tab-delimited)" and copy the results. We now go back to the search results of TRANSFAC TFs, click on "Import results", and paste the results in the text box as illustrated below.



After clicking "Import", the two hits will be appended to the end as shown below. The results from two searches can now be visualized together by clicking "Display results as images".

K:

The PSSM method we use may score nucleotide pairs depending on the value of K, the maximal distance between two bases. K=0 means that no nucleotide pairs are scored, equivalent to the "regular" PSSM method. K=1 means that all the adjacent nucleotide pairs are scored, while K=2 means that all the adjacent pairs as well as pairs separated by one nucleotide in between are scored. The K for a precomputed PSSM model is determined by 10-fold cross-validation experiments.

AUC:

Area under the ROC curve from 10-fold cross-validation experiments. Its value ranges from 0 to 1. The greater the value, the more effective a PSSM model.