Retrieve the promoter of a gene. A gene is currently identified by the official symbol (e.g. A1BG) or the NCBI Gene database identifier (an integer) or a mRNA accession number (e.g. NM_130786). The transcription start position provided by the UCSC Genome Browser is used as the transcription start site (TSS). Conversion from other identifiers to the NCBI Gene ID can be easily done by the Gene ID coversion tool offered by the DAVID Bioinformatics Resources.
Seven species are currently supported:
Enter the known binding sites pertinent to a TF in the Fasta format.
Enter a precomputed position-specific count matrix in one of the three supported formats:
A [ 34 16 7 58 51 0 2 112 116 0 14 66 13 39 36 25 ] C [ 37 33 51 14 4 116 113 0 0 1 65 6 20 43 9 35 ] G [ 27 26 25 41 56 0 1 1 0 0 33 42 73 22 47 29 ] T [ 18 41 33 3 5 0 0 3 0 115 4 2 10 12 24 27 ]
PO A C G T 01 28 1 2 2 A 02 1 0 0 32 T 03 1 0 31 1 G 04 7 25 0 1 C 05 0 33 0 0 C 06 0 33 0 0 C 07 23 0 0 10 A 08 2 0 0 31 T 09 33 0 0 0 A 10 0 0 0 33 T 11 31 0 0 2 A 12 10 0 0 23 T 13 0 0 33 0 G 14 0 0 33 0 G 15 12 4 3 14 W 16 13 9 8 3 N 17 11 6 7 9 N 18 1 4 8 20 T
Gene: Aft1-primary Motif: A.TGCACCC Enrichment Score: 0.482942724828625 A: 0.106206209640922 0.155258373589726 0.362935257239374 0.179981378259842 0.426657762868071 0.13688115038742 0.716619001468936 0.145161072631796 0.0342809276528156 0.160763289170782 0.0109813649457007 0.961083219060667 0.00832020751017246 0.0250020357121778 0.0196753328330524 0.181885409487817 0.265912987550001 0.287381242249375 0.276066006197372 0.123919492224826 0.154128688222547 C: 0.205920122710712 0.249366061069948 0.19701177393174 0.396922606036913 0.231448649943595 0.0786085508111989 0.0252312407295708 0.0310444550598084 0.0479058505935287 0.0101273271992075 0.96685691379214 0.00920418605632664 0.972061751869787 0.955930572862017 0.740566916117894 0.257711125377008 0.252071568116971 0.155527050151294 0.180275383141046 0.225917310129616 0.236680391255713 G: 0.231857467992134 0.220654512495749 0.221779465244512 0.222245708297486 0.154409281410371 0.142755096012319 0.0315611879956547 0.288672410229433 0.0392421028440115 0.812386000932422 0.0112396907969206 0.0155411555975395 0.00907985951114708 0.00231660570686073 0.0137829609506435 0.494574967261659 0.241654128057805 0.183562686409529 0.0688601044687691 0.357404663871428 0.458613358343335 T: 0.456016199656232 0.374721052844577 0.218273503584374 0.200850307405759 0.187484305777963 0.641755202789062 0.226588569805839 0.535122062078962 0.878571118909644 0.0167233826975886 0.010922030465239 0.0141714392854669 0.0105381811088939 0.0167507857189444 0.22597479009841 0.0658284978735163 0.240361316275224 0.373529021189803 0.474798506192812 0.292758533774129 0.150577562178406
Sources of matrices:
A model is derived from a count/probability matrix. E.g.
The LASAGNA algorithm is first used to align the known (variable-length) TFBSs of a TF. The alignment is then trimmed to build a PSSM model.
Use precomputed TF models based on TFBSs in the TRANSFAC Public database aligned by the LASAGNA algorithm.
Use precomputed TF models based on TFBSs in the ORegAnno database aligned by the LASAGNA algorithm.
Use precomputed TF models based on TFBSs in the PAZAR database aligned by the LASAGNA-ChIP algorithm.
Use matrices in the TRANSFAC Public database.
Use matrices in the JASPAR CORE database.
Use matrices in the UniPROBE database.
Import search results on the same promoter sequences (identified by the headers). As an example, we first search the promoter of yeast gene CTS1 (NCBI Gene ID: 850992) for TFBSs of the 13 TRANSFAC yeast TFs. This is done by selecting "Use TRANSFAC TFBSs" and setting the species to "Saccharomyces cerevisiae (yeast)". We then search the same promoter for TFBSs of ACE2. This is done by selecting "Enter known TFBSs" and pasting the known TFBSs of ACE2 obtained from the ORegAnno database. After the search finishes, we click on "Display results in plain text (tab-delimited)" and copy the results. We now go back to the search results of TRANSFAC TFs, click on "Import results", and paste the results in the text box as illustrated below.
The PSSM method we use may score nucleotide pairs depending on the value of K, the maximal distance between two bases. K=0 means that no nucleotide pairs are scored, equivalent to the "regular" PSSM method. K=1 means that all the adjacent nucleotide pairs are scored, while K=2 means that all the adjacent pairs as well as pairs separated by one nucleotide in between are scored. The K for a precomputed PSSM model is determined by 10-fold cross-validation experiments.
Area under the ROC curve from 10-fold cross-validation experiments. Its value ranges from 0 to 1. The greater the value, the more effective a PSSM model.