TESeeker Download Documentation Contact

Documentation

BMC Bioinformatics Manuscript
User Manual (pdf) (more details and with screen captures)
Slide Presentation (pdf)

Installation

TESeeker can run on any operating system that supports the VirtualBox virtualization software package, currently available for Windows, OS X, Linux, and Solaris. The following steps shall be followed to install TESeeker:
  1. Download and install VirtualBox.
  2. Download the TESeeker files.
  3. Open VirtualBox.
  4. Click File and then click Import Appliance... and complete the wizard, selecting the TESeeker .ovf file as the source.

Usage

After installation, start TESeeker by opening VirtualBox, clicking teseeker in the left frame, and then clicking Start. The virtual appliance hosting TESeeker will then boot. The booted appliance will contain 7 desktop items: the Genomes and TELibrary folders, shortcuts to bring up the documentation and web interfaces, and the license. All genomes and library files must be placed in the folders on the desktop and must be in the FASTA file format with a .fa, .fas, or .fasta file extension. We have included the Pediculus humanus humanus genome and our representative TE library within the virtual appliance.

Clicking the TESeeker shortcut on the desktop will load the web interface whereby users can modify the default parameters, most notably the BLAST Query Library, BLAST Database, and the Desktop Output Folder Name. Hovering the mouse over the parameter name will provide a more detailed description. Once the parameters have been set, clicking submit will briefly show the selected parameters and then start the search. The browser will display Running until the job completes, at which point the webpage will notify the user. When finished, users navigate to the specified output folder on the desktop to view results.

If the user elects to find the coding region only, results are organized as follows within the specified output folder: the codingRegion_files folder contains intermediary output, the output folder contains all the singlets and contigs produced, and the remaining files represent the contigs and singlets produced from CAP3. For example, a file called cap2c_out.fas contains the contig sequences from the second iteration of CAP3, while cap1s_out.fas contains the singlet sequences produced from the first iteration of CAP3.

If a consensus sequence is desired, the results are organized as follows within the specified output folder: the codingRegion_files folder contains intermediary output from the coding region search, the folder consen_files contains intermediary files from the consensus search, and the output folder contains the contig and singlet sequences produced from each sequence that was fed into the consensus search. Additionally, all contig and singlet sequences are available in single FASTA files in the specified output folder.

Example Run

TESeeker can be run with the default parameters on the included Pediculus humanus humanus genome. Simply click Submit at the main TESeeker interface and the job will start. The amount of time for the job to complete depends on your host machine, but it should be complete within around 10 minutes. When the web interface says the job is done, either click on the link to the output folder or navigate to the folder on the desktop. Contig and singleton sequences will be in the top-level of the output folder with the subfolders containing individual sequence data.

Additional Tools

There are also BLAST and Extract shortcuts on the desktop within the virtual appliance. These web interfaces offer additional functionality by making it simple to do a custom BLAST search or sequence extraction using files within the Genomes folder.

Technology

TESeeker utilizes a variety of technologies. The core bioinformatics tools, BLAST, CAP3, ClustalW2, and BioPerl, are tied together through bash scripts. Researchers interact with TESeeker through a web-based form implemented in html/php and handled by the lighttpd web server. The form interacts with the local scripts and utilizes a PostgreSQL database and cgi/Perl to notify researchers when a job has completed. TESeeker is installed on Ubuntu 10.04 LTS.

If changes need to be made to the core virtual appliance, contact Ryan.C.Kennedy(at)alumni(dot)nd(dot)edu to obtain the administrative password.

Additional Information

Additional documentation with screen captures is available here.

Acknowledgments

TESeeker was supported in part by NIAID/NIH contracts HHSN272200900039C and HHSN266200400039C for "VectorBase: A Bioinformatics Resource Center for Invertebrate Vectors of Human Pathogens." Computational resources provided in part by the Notre Dame Center for Research Computing.