
MPI
for Molecular Genetics 
Computational
Molecular Biology
bgmm: Beliefbased Gaussian Mixture Modeling


bgmm is an R package for knowledgebased mixture modeling. It implements mixture modeling variants, which differ with respect to the amount of incorporated knowledge, and spread the entire range from unsupervised to supervised modeling.
Our focus is on partially supervised modeling, which to our knowledge is not supported by any other openaccess software. bgmm is also the only R package, which implements semisupervised modeling. The availability of all mixture modeling variants allows for a comparison analysis between estimates
obtained with different models. The figure on the left schematically illustrates the percentage of labeled observations and their certainty required by each implemented variant. 

The basic functionality of bgmm, described in Biecek et al. includes:
 beliefbased mixture modeling  our theoretical contribution to partially supervised mixture modeling,
 softlabel mixture modeling  a partially supervised mixture modeling method, proposed by Come et al.,
 semisupervised mixture modeling,
 unsupervised mixture modeling,
 specifying constraints on the fitted model structure,
 simulation of data from userspecified model parameters or model structure,
 plotting of the fitted models of up to twodimensional data,
 model selection  fitting a range of models with different structures or component numbers. The models are evaluated using the GIC scores,
 prediction of classes or clusters for a given set of observations using the fitted models.

Additionally, bgmm offers application of mixture modeling to differential gene expression analysis, as proposed in Szczurek et al. The modeled data are onedimensional log expression ratios of treatment versus control. The labeled observations are genes expected to be differentially expressed, i.e., up or downregulated in this experiment.
bgmm can be applied to fit a two or threecomponent mixture model to this input data and knowledge. The two components correspond to a low variance Gaussian for the unchanged, and a high variance Gaussian for the differential genes. The three components correspond to a low mean Gaussian for the downregulated, zeromean for the unchanged, and a high mean Gaussian for the upregulated genes (illustrated on the plots on the right).
The posterior probabilities in the fitted model of choice are used to compute the probabilities of differential expression for each gene in the analyzed experiment.




Download The latest release of the bgmm package is available from CRAN.
A demo in a html format presents the basic modelfitting, model selection, data simulation and prediction functionality of the bgmm package. The presented function calls and output plots include functions described in Biecek et al., and more, e.g. modeling of onedimensional data and application to differential gene expression analysis.
For more details about the specific functions refer to the bgmm reference manual .
 

References
 E. Szczurek, P. Biecek, J. Tiuryn and M. Vingron (2010).
Introducing knowledge into differential expression analysis. J Comput Biol., 17(8):95367 pdf
 P. Biecek, E. Szczurek, M. Vingron and J. Tiuryn.
The R package bgmm: mixture modeling with uncertain knowledge. Submitted.

Côme, E., Oukhellou, L., Denux, T., et al. 2009. Learning from partially supervised data using mixture models and belief functions. Pattern Recogn. 42, 334.348.
