Today, Natalie presented the paper “Protein and gene model inference based on statistical modeling in k-partite graphs” by Gerster et al. in PNAS.
The paper proposed that their MIPGEM (Markovian Inference of Proteins and Gene Models) is comparable in previous approaches of peptide scoring such as N-peptideruel, ProteinProphet, Nested mixture model, Hierarchical statistical model ,and MSBayesPro.
The authors claim the method is novel in several ways:
1. They allow dependencies between peptide scores.
2. They allow shared peptides
3. The model peptide scores as random values to allow for low-quality peptide scoring.
4. They can infer the probability of a gene model being present.
There are two main points of this proposal such that ;
1. Dependencies between peptides and proteins based on Markovian assumptions
- Their model computes the probability of a protein being present given the probabilities or scores of the observed peptides.
2. Considering ‘Shared and unshared’
- Shared peptides contribute to increase or decrease the probability for presence of a protein, depending on whether the peptide scores are above or below the median of all peptides scores.
According to datasets (Mixture of 18 purified proteins and sigma49 dataset), the paper compare the performance of MIPGEM with other methods in the graph of number of true positives and false positives. In the Mixture of 18 purified proteins, their model has similar between others but it performs slightly worse. But in sigma46, for sigma49 dataset, the number of true positives MIPGEM only goes up straight in 0 of false positives and flattens out and it can be used to achieve zero false positives.
To sum up, this paper insisted that their MIPGEM is reliable for protein and gen model inference mode, however, through our discussions, this proposal is not much better than other methods.
Specific criticism:
1. The tradeoff of FP and FN does not warrant the added complexity of the method. Zero FP rate was obtained with a very high FN rate.
2. The authors choose a mixture model for conditional probabilities of peptide scores given the presence/absence of proteins. However, this function is given no justification. Nor is any advice given about choosing this function based on the nature of the peptide scores.
3. They discard useful information about spectral counts for peptides, instead adopting a more conservative approach of only accepting the best spectrum per peptide.
4. The method does present much novelty.
Speaker: Natalie
Scribe: Yoona