Friday, January 28, 2011

1-28-2011 Computing Fragmentation Trees from Tandem Mass Spectrometry Data

Rasche, F, Svatoš, A, Maddula, RK, Böttcher, C, Böcker, S (2010). Computing Fragmentation Trees from Tandem Mass Spectrometry Data. Anal Chem, :np page given.

Goal: Annotate MS2 spectra of small molecules with chemical compositions.

Data: Orbi, QstAR, QTOF spectra of ~200 known compounds (<1KDa) with different CID energies
Build consensus spectra from different energies

Strategy
  1. Calculate chemical formula of parent mass from MS1 spectrum (given)
  2. Guild graph (DAG) of possible formulas to explain peaks
    • Node for each possible chem formula for a single peak (usually 3-4 nodes per peak)
    • Color nodes from the same peak with the same color
    • Edge between nodes if they could be generated by a plausible neutral loss
  3. Build tree from graph giving the most likely pattern of fragmentation to explain spectrum
    • Score nodes based on likelihood of that formula
    • Score edges according to known probabilities of various neutral losses
    • Want a 'colorful' graph, ie explains most peaks ("colors") in the spectrum, and few peaks have multiple formula associated
1-2 are solved heuristically.
Solve 3 formally:
Input: DAG + edge weights + set of colors
Output: Directed tree with maximal edge weight containing a unique set of colors
Solved by dynamic programming.

Conclusions
  • Fragment tree is alternate representation of spectra
  • Helps experts with manual peak annotation (somehow).
  • Better than existing greedy solution and other tools for predicting spectra from structure (Mass Frontier)
  • Could be used for comparing fragmentation trees to a library, but this isn't well developed.
  • No clear progress towards goal of determining structure from MS2

Critiques:
  •  Not very systematic in the analysis
  •  Poor paper organization
  •  Are fragment trees useful?
Speaker: Anand
Scribe: Spencer
Slides: here

1-21-2011 Identifying complex patterns of PTMs and Un-targeted database search

Guan, S, Burlingame, AL (2010). Data processing algorithms for analysis of high resolution MSMS spectra of peptides with complex patterns of posttranslational modifications. Mol. Cell Proteomics, 9, 5:804-10.

Aim: For a given Spectrum, Peptide, and set of PTMs, find a subset of plausible PTM modified peptide configurations and their relative abundances.
Achievements: 
- Greedy algorithm, repeatedly selects maximum scoring plausible PTM configurations from set of all possible configurations. 
- Relative Abundance is computed by minimizing the least square error of theoretical peak intensity based on configuration abundance and empirical intensity.

Responses:
- May not consider very similar configurations
- Will not handle overlapping peaks.
- A first attempt solution, where improvements can be made.
- Validation was lacking/non-existent.


Baliban, RC, DiMaggio, PA, Plazas-Mayorca, MD, Young, NL, Garcia, BA, Floudas, CA (2010). A novel approach for untargeted post-translational modification identification using integer linear optimization and tandem mass spectrometry. Mol. Cell Proteomics, 9, 5:764-79.

Aim: For a given Spectrum and x peptides, identify the PTM and localization. (With the implicit assumption that unrestricted search means a large set of plausible PTMs)
Achievements:
- ILP formulation of the problem to output candidate PTM-peptide matches with post-finagling to acquire best modified peptide matches
- Results appear excellent, with comparison to other tools.
- 5 distinct mass spec datasets.

Critiques:
- Good preprocessing policy.
- Reduce spectra to "b" ions only. This policy varies upon instrument.
- Strategy is slow, but the accuracy of results compensates.
- Unclear what "manual validation" means.
 
Speaker: Xiaowen
Scribe: Anand
Slides: here

Wednesday, January 19, 2011

1-14-2011 Spectrum denoising

A novel approach to denoising ion trap tandem mass
spectra" by Jiarui et al, 2009


spectral pre-processing
goal for pre-processing
1. remove the noise
2. decrease the number of non-identified spectra
3. increase the number of identified peptides

procedure
1. denoising of spectrum
-signal peaks: peaks from y or b
-noisy peaks: other peaks
2. intensity normalization
using 5 interrelation features
1. F1: # of peaks p’ such that p-p’ = an a.a. mass
2. F2: # of peaks p’ such that p+p’ = precursor mass
3. F3: # of peaks p’ such that p-p’ = H2O for NH3
4. F4: # of peaks p’ such that p-p’ = CO or NH
5. F5: # of peaks p’ such that p-p’ = isotope mass
6. score: w0 + w1F1 + w2F2 + w3F3 + w4F4 + w5F5
7. if score is minus they exclude the peak(noise)
peak selection
-after intensity normalization it is likely that signal peaks are local maxima
to select the local maxima, morphological reconstruction filter is adopted
dataset
-ISB: ESI ion trap 37044 spectra
-TOV: LCQ DECA XP ion trap 22576 spectra
-database: ipi.Human protein database
-Mascot is used to evaluate denoising
Number of identified spectra
-spectrum is identified if its Mascot ion score is larger than the identity threshold
results
-Denoised spectrum increased the # of identification of Mascot search
Features of spectrum that other people use in preprocessing
-Number of peaks
-total ion current
-Good Diff fraction
-Total normalized intensity of peaks with associated isotope peaks
-complements
-water losses
-signal to noise ratio
Conclusion
-intensity normalization is too heuristic
-among used features, neutral losses are often observed in noisy peaks
-features were manually selected, and no new feature was introduced
the benefit of morphological filter is not clear
-standard target-decoy analysis was not shown
-it is about denoising, but the result of denoising is not directly shown
-proposed scheme may not suitable for other tools
-the running time of their algorithm is not shown
discussion
-They increased the # of identifications of MASCOT
-Spectrum preprocessing might be good on De Novo but no significant improvements on Database search
-Preprocessing is highly dependent on scoring function itself


Speaker: Kyowon
Scribe: Sunghee
Slides: here

Monday, January 3, 2011

12/10/2010 Spatial segmentation of imaging mass spectrometry data with edge-preserving image denoising and clustering


Title : Spatial segmentation of imaging mass spectrometry data with edge-preserving image denoising and clustering
by Theodore Alexandrov, Michael Becker, Sören Deininger, Günther Ernst, Liane Wehder, Markus Grasmair, Ferdinand von Eggeling, Herbert Thiele, and Peter Maass
  1. Background: MS Imaging enables visualization of the spatial distribution of e.g. compounds, biomarker, metabolites, peptides or proteins by their molecular masses.
  1. What they did: the authors proposed a new procedure for spatial segmentation of MALDI-imaging dataset.
  1. How they did: they built the pipeline that consists of
    • spectra preprocessing
    • peak picking
    • edge-preserving denoising of mz images
    • finally, clustering.
  2. More in detail...
    • Spectra preprocessing uses baseline correction, which reduces the intensity errors.
    • Peak picking picks only 10 peaks in each 10th spectra, and keeps peaks at least 1% across entire sample. Orthogonal Matching Pursuit (OMP) is used since it is simple and fast.
    • Denoising uses Grasmair modification of Total Variation minimizing Chambolle algorithm. The parameter theta controls the smoothness.
    • Clustering uses High Dimensional Discriminant Clustering (HDDC), where each cluster is modeled by a Gaussian distribution.
  1. Results
    • Dataset : Rat brain coronal section and Section of neuroendocrine tumor (NET) invading the small intestine
    • Peak picking using OMP detects major peaks successfully.
    • Denoising with Grasmair method removed noise efficiently not smoothing out edges. The result illustrates the selection of parameter theta is important, though.
    • The clustered image by proposed pipeline and the segmentation map of rat brain were shown to be similar each other. The edge preserving denoising affects the clustering result.
    • 3 parameters for peak picking and 2 parameters for denoising and clustering should be tuned for good result.
  2. conclusion
    • HDDC clustering is better than k-means but slow.
    • It is important for cancer study.
  3. Criticism
    • What they are optimizing is unclear.
    • Too many parameters can influence the result yet no optimal values are given for various applications.
    • Slow running time makes it hard to run multiple trials.
Speaker: Jocelyne
Scribe: Kyowon
Slides: here