Japan Bioinformatics KK

 

Data management, distribution, security and analysis

 
 
 

Home

Microarray Survey

Profile

Simbiot

Single-user accounts

Simbiot Mobile

Simbiot Collaboration

Private Servers

Perosnalized Medicine

Intro to Simbiot

Intro To Simbiot (Jp)

About Microarrays

Gene Expression Analysis

cDNA: Expression Analysis

cDNA: Time Course

cDNA: Clustering

cDNA: PCA

SNP Analysis

SNP: GWAS

SNP: LD

SNP: CNV

Consulting

High Speed Sequencers

HSS: De Novo RNA Seq

HSS: RNA Seq

HSS: ChIP Seq

HSS: Genomic Variations

HSS: miRNA Seq

News

News Item: DNAFORM

News Item: Nikkei Bio

News Item: GEN

Case Studies

Case Study: DNAFORM

Case Study: SMU

Case Study: CRO

Partners

Publications

Employment

Contact

 
 

Introduction to Simbiot

cDNA Analysis

The following image illustrates basic analysis path for cDNA microarrays.  Initially the data of interest should be uploaded and submitted to the Simbiot Database. 

After submission, the data are “Exported” to create a Raw Data Matrix (RDM).  The data could then be scaled to create a Scaled Data Matrix (SDM).  Please note that scaling options are available only for Illumina microarrays – the Export procedure for Affymetrix Chips automatically scales the data using log base 2 (log2) transformation.  After scaling a variety of normalization functions could be applied to create a Normalized Data Matrix (NDM).  The Normalized Data Matrix is then ready for direct analysis or it could be filtered based on pathways, gene ontology or other conditions to focus the analysis specifically on genes of interest.

SNP Analysis

The analysis process for SNP microarrays is similar:


The Export request for Affymetrix chips includes scaling and normalization.  The Illumina microarrays need to be normalized after Export (with a large number of available parameters).  Following Export and the optional Normalization the data can be analyzed using the built-in peer-reviewed algorithms.  Also, the data can be filtered using a variety of functions, in which case the analytics will be applied only to the data of interest.

Processing and Status

Simbiot is an asynchronous system.  When the user makes a request, the request is entered into a queue and scheduled for execution when system’s resources are available.  In general, any Simbiot request will have one of the following states:

  1. Pending:           The request has been made but not yet scheduled.
  2. Running:           The system is executing the request
  3. Done:               The request is complete
  4. Error:                The system is unable to execute the request.

The request status is displayed on the Object Detail Page as well as on the overall project list.  The states also are color coded.

If the object remains in the Pending state for a long time, it could indicate that the system is busy performing the requests for other users.  The number of all pending requests can be displayed by activating the “Queue” link at the top right of the screen.

After request is complete, its status will change to “Done” and a notification will be e-mailed to the user’s registered account.  That notification will contain a link to the completed request.

Resubmitting Requests and What-if Analysis

Most analysis results will offer the user an opportunity to change parameters without reclassifying the samples.  This function is made available on the parameters tab:


The parameters will depend on the specific analysis function and will be the same as the once used in the original analysis request screens.  The user should set the new parameters, enter name and comment (if desired) and activate the “Re-Analyze” button to initiate a new analysis request.

Common Screen Formats

The Simbiot system relies on some common screen formats that are reused in a number of places.  Some of these formats (those related to analyses) are described below.

Submit Analysis Screen

In general, the analysis submission process requires users to classify samples, set parameters, provide descriptive name and comment and initiate the process.  Most submission/classification screens have the following layout:


For some analysis (for example time course-related analysis), the order of samples within a classification is significant.  For others (expression analysis) it is not material.  After classifying the samples, setting the parameters and (optionally) providing name and comment, the user can activate the “Analyze” button to launch the request.

After the analysis is complete the users get access to the analysis object, which also have some common features.  One such feature – resubmission – was discussed above.


Interactive Grid

The interactive grid is used to present analysis results in a familiar table format.  The grid will have a different layout for each analysis and data type.  However all layouts will share some common features.


The links-out include interfaces to Refseq and other NCBI (Sayers, Barrett et al.) resources, Ensembl (Hubbard, Aken et al. 2009), SwissProt/UniProt (Boeckmann, Bairoch et al. 2003) (all using the appropriate keys), as well as PubMed search based on gene symbol.  The Probe Id will link to the probe detail page, which includes Vendor supplied probe information.  The left-most column contains an arrow for opening drop-down detailed information about the item.  Typically, this information will include Gene Ontology (Ashburner, Ball et al. 2000) and KEGG Pathways (Kanehisa and Goto 2000) data, as well as a graph (if a graph has been previously generated for the probe).


Probe Details

For example, the image below contains detailed information about an Affymetrix probe:


The top part of the screen contains the information supplied by the vendor.  The probe sequence information is a link to GenBank’s (Benson, Karsch-Mizrachi et al.) BLAST page for further validation.  Below the vendor’s information are one or two boxes containing JBI’s validation of the probes (if available).  This includes alignment of the sequences against the current version of Ensembl’s (Hubbard, Aken et al. 2009) transcripts.  The validations include links to Ensembl (Hubbard, Aken et al. 2009).


Ad-hoc Graphs

The Graph link on the grid provides a connection to the Ad-Hoc graphs interface.  (This interface also is accessible from its own tab in Analysis Results Detailed View.)


If this screen is entered from the Interactive Grid, the Gene Symbol is already selected and the plot is initiated automatically.  The plot may result in multiple images – one for each probe that corresponds to a given gene symbol.  Of course, the user may select a pre-defined gene set and plot multiple gene symbols (and the individual gene symbols could result in multiple plots).  In addition to the JPEG format, the user may download the PDF file containing the image.  The pictures are interactive and the user may “mouse-click” on an images to get a better view.


Additional Data

By clicking on the left-most down-arrow the user may display additional data for the particular probe.  In particular Gene Ontology (Ashburner, Ball et al. 2000), KEGG Pathways (Kanehisa and Goto 2000) and the previously generated graph (if any):


Scroll Bar, Column Resize and Sort

In most cases, the left-most columns of the grid contain annotation information and the columns on the right contain computed values.  By moving the scroll-bar to the right, the user may view additional columns. The column size can be changed by a “click-and-drag” on the column partitions.  The entire list can be sorted by a “mouse-click” on any of the computed columns.  “Click” once for ascending order.  “Click” again for descending order.


Saving Gene Sets

After establishing the sort order, the user can use the grid interface to save a set of significant genes.




The save process is initiated by activating the “Save Gene Set” button.  This brings up a window asking for information about the Gene Set.  Name and comment should be entered.  Also, the users may chose how to select the genes.  They can pick the top X genes (based on the current sort order) or select genes less than (if ascending sort) or greater than (if descending sort) some number.  This number will be compared to the current sort variable and all genes that match the selected criteria will be saved into that Gene Set.


Interactive Graphs

The interactive graphs are accessed from the Graph tab of the analysis object.  That tab presents the JPEG images as well as links to PDF and to the Interactive Graphs application.


Activating the Interactive link launches the application.


The graph contains 4 main items:

 

  • The action selection buttons
  • Main image
  • Navigation image
  • List of selected points

Action Selection

The user may select points, unselect points, unselect all point, zoom in or zoom out.  These selections control the type of action the user gets by interfacing with the main image.

 

The Select action allows individual probes to be selected by a “mouse-click” on those points, or multiple points by selecting a rectangle using a “click-and-drag” interface.  The user continues to select points while the Select option is highlighted.

The Deselect action works similarly to Select, but individual and multiple points are removed from the selected set.

Deselect all unselects all points.

Zoom allows the user to select a rectangle and zooms the view into that rectangle.  The user may continue zooming into the image.  When the main image view is in “zoom” mode, the user may interact with the Navigation image to display a different visible area in the main view.  When in “zoom” mode, the user may change the selection to either Select or Deselect to perform those actions in the limited view.

Zoom out returns the main view to the default status.


Selected Points

This view is similar to the interactive grid.  It provides links to detailed annotation information both within Simbiot as well as in the external databases.


To save a gene set (or SNP set), the user should enter name and comment and press the Save button.  This will generate a new gene set containing the genes designated by the selected points.

Integrated Algorithms

Data Extraction and Pre-Processing Functions

Simbiot uses peer-reviewed algorithms for extracting, pre-processing and analyzing the data.  For the Export process, the following Bioconductor (Gentleman, Carey et al. 2004) tools are utilized:

  1. Affymetrix cDNA chips:            Bioconductor affy (Gautier, Cope et al. 2004)
  2. Affymetrix SNP chips:  Bioconductor crlmm (Carvalho, Bengtsson et al. 2007)
  3. Illumina cDNA microarrays:      Bioconductor lumi (Du, Kibbe et al. 2008)
  4. Illumina SNP microarrays:         Bioconductor beadarraySNP (Oosting, Lips et al. 2007)

Data Analysis

For analyzing the data, peer-reviewed and standard analytics have been integrated with data-independent format:

  • Clustering: Cluster 3.0 (de Hoon, Imoto et al. 2004), R-based clustering functions
  • Standard statistics: R-based clustering functions
  • Expression analysis: samr (Tusher, Tibshirani et al. 2001), Bioconductor limma (Smyth 2004) and LPE (Jain, Thatte et al. 2003) packages
  • Time course analysis: Bioconductor timecourse (Tai and Speed 2009) and maSigPro (Conesa, Nueda et al. 2006) packages
  • SNP Association study and linkage disequilibrium: Bioconductor snpMatrix (Clayton and Leung 2007) package
  • Copy number analysis: Bioconductor snapCGH (Marioni, Thorne et al. 2006; Smith, Marioni et al. 2006).

References

Ashburner, M., C. A. Ball, et al. (2000). "Gene ontology: tool for the unification of biology. The Gene Ontology Consortium." Nat Genet 25(1): 25-9.

Benson, D. A., I. Karsch-Mizrachi, et al. "GenBank." Nucleic Acids Res 38(Database issue): D46-51.

Boeckmann, B., A. Bairoch, et al. (2003). "The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003." Nucleic Acids Res 31(1): 365-70.

Carvalho, B., H. Bengtsson, et al. (2007). "Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data." Biostatistics 8(2): 485-99.

Clayton, D. and H. T. Leung (2007). "An R package for analysis of whole-genome association studies." Hum Hered 64(1): 45-51.

Conesa, A., M. J. Nueda, et al. (2006). "maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments." Bioinformatics 22(9): 1096-102.

Consortium, U. "The Universal Protein Resource (UniProt) in 2010." Nucleic Acids Res 38(Database issue): D142-8.

de Hoon, M. J., S. Imoto, et al. (2004). "Open source clustering software." Bioinformatics 20(9): 1453-4.

Du, P., W. A. Kibbe, et al. (2008). "lumi: a pipeline for processing Illumina microarray." Bioinformatics 24(13): 1547-8.

Gautier, L., L. Cope, et al. (2004). "affy--analysis of Affymetrix GeneChip data at the probe level." Bioinformatics 20(3): 307-15.

Gentleman, R. C., V. J. Carey, et al. (2004). "Bioconductor: open software development for computational biology and bioinformatics." Genome Biol 5(10): R80.

Hubbard, T. J., B. L. Aken, et al. (2009). "Ensembl 2009." Nucleic Acids Res 37(Database issue): D690-7.

Jain, N., J. Thatte, et al. (2003). "Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays." Bioinformatics 19(15): 1945-51.

Kanehisa, M. and S. Goto (2000). "KEGG: kyoto encyclopedia of genes and genomes." Nucleic Acids Res 28(1): 27-30.

Marioni, J. C., N. P. Thorne, et al. (2006). "BioHMM: a heterogeneous hidden Markov model for segmenting array CGH data." Bioinformatics 22(9): 1144-6.

Oosting, J., E. H. Lips, et al. (2007). "High-resolution copy number analysis of paraffin-embedded archival tissue using SNP BeadArrays." Genome Res 17(3): 368-76.

Sayers, E. W., T. Barrett, et al. "Database resources of the National Center for Biotechnology Information." Nucleic Acids Res38(Database issue): D5-16.

Smith, M. L., J. C. Marioni, et al. (2006). "snapCGH: Segmentation, Normalization and Processing of aCGH Data Users' Guide." Bioconductor.

Smyth, G. K. (2004). "Linear models and empirical bayes methods for assessing differential expression in microarray experiments." Stat Appl Genet Mol Biol 3: Article3.

Tai, Y. C. and T. P. Speed (2009). "On gene ranking using replicated microarray time course data." Biometrics 65(1): 40-51.

Tusher, V. G., R. Tibshirani, et al. (2001). "Significance analysis of microarrays applied to the ionizing radiation response." Proc Natl Acad Sci U S A 98(9): 5116-21.


Please contact Japan Bioinformatics KK for more information.