Index search

The local search machine can be used to query project data that has been transformed to XML and indexed. An overview of the currently stored data sets is given below. The search engine is based on Solr and Lucene technology.

Search Project Data

Query syntax

Query type Example Comments
One keyword query "Lipoprotein",   "E2F6" records that contain the word 'Lipoprotein' / 'E2F6'
Negative query (exclusion) "-Lipoprotein" records that do not contain 'Lipoprotein'
Field query "vindex:biomodels",   "identifier:OCT4" Searches specific fields, e.g. vindex (=subindex), or identifier
Boolean combinations: OR, AND (default). AND can be omitted "(oct4 OR klf) AND sox2 AND -vindex:uniprot" records containing OCT4 or KLF together with SOX2, but without Uniprot results
Wild card query (one character) "te?t" records that contain 'test', 'tent', 'text', etc.
Wild card query (multiple characters) "te*t" records that contain 'test', 'tent', 'text', 'termite', 'temperature' etc.
Fuzzy search "proteome~" records that contain terms like protein, proteomics, proteasome, etc.
Range query "field:[0 TO 0.05]",  "vindex:[a* TO z*]" records with values within certain ranges, e.g.
p-values < 0.05 or words starting with certain characters

Index structure

Index Field Description Example
title The title of the record 'HPPP2 mass spectrum (nr. 53) - identifications: COBA1, COL11A1'
identifier Public database cross-reference 'IPI00455877.1'
vindex Subindex defining the data type 'mass_spectra'
content Record content Mass spectrum identifications; PubMed abstract; Uniprot FASTA sequence, ...
annotation Meta-data associated with the record Data source, experimenter, ...

Index contents

Data type Source File format Nr. of entries Description
Protein mass spectra PRIDE acc. 8538 mzData 745 Peptide tandem mass spectra (Homo sapiens) with identifications
DNA microarrays GEO acc. GSE3325 MINiML 19 Prostate cancer study; chip platform: Affymetrix U133 Plus 2.0 arrays (Homo sapiens)
GEO acc. GSE1133 MINiML 438 Novartis gene atlas 2004 (mouse and human arrays)
GEO acc. GSE10204, GSE11193 MINiML 80 Genetic functional basics of water-binding-capacity in pork; chip platform: Affymetrix Porcine Whole Genome Array
Studies MPI Berlin XML 'study' 7 Overview of statistical analyses
Test result tables MPI Berlin STAT-ML 94497 Results of statistical analyses of microarray experiments
Microsatellite markers / phenotypes University Bonn XML 'pigs' 873 Pig marker and trait values
Molecular interactions IntAct PSI-MI 5915 Yeast-2-hybrid datasets from Rual et al. and Stelzl et al.
CPDB XML 'cpdb' 46454 Interactions involving genes, proteins, and compounds; source: ConsensusPathDB
Molecular Models BioModels SBML 699 Mathematical models of gene regulatory pathways
Synonyms pig Affymetrix XML 'synonyms' 24123 Pig genome annotations
Synonyms human Affymetrix XML 'synonyms' 54675 Homo sapiens genome annotations
Protein sequences Uniprot FASTA 16.5 mio. Protein sequences (FASTA format)
Publications PubMed XML 'pubmed' 18.2 mio. Publications in PubMed starting from 1970
Foswiki pages DIPSBC TXT 26 Web pages within the DIPBSC platform
Total nr. of entries     34.970.538  

Topic revision: r5 - 12 Jan 2009 - 13:45:20 - FelixDreher

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback