Recent Submissions

• Journal Article

Data-driven coarse graining of large biomolecular structures. ﻿

PloS one 2017; 12(8): Art. e0183057
Advances in experimental and computational techniques allow us to study the structure and dynamics of large biomolecular assemblies at increasingly higher resolution. However, with increasing structural detail it can be challenging to unravel the mechanism underlying the function of molecular machines. One reason is that atomistic simulations become computationally prohibitive. Moreover it is difficult to rationalize the functional mechanism of systems composed of tens of thousands to millions of atoms by following each atom's movements. Coarse graining (CG) allows us to understand biological structures from a hierarchical perspective and to gradually zoom into the adequate level of structural detail. This article introduces a Bayesian approach for coarse graining biomolecular structures. We develop a probabilistic model that aims to represent the shape of an experimental structure as a cloud of bead particles. The particles interact via a pairwise potential whose parameters are estimated along with the bead positions and the CG mapping between atoms and beads. Our model can also be applied to density maps obtained by cryo-electron microscopy. We illustrate our approach on various test systems.
• Journal Article

Distribution and evolution of stable single α-helices (SAH domains) in myosin motor proteins ﻿

PLOS ONE 2017; 12(4): Art. e0174639
Stable single-alpha helices (SAHs) are versatile structural elements in many prokaryotic and eukaryotic proteins acting as semi-flexible linkers and constant force springs. This way SAH-domains function as part of the lever of many different myosins. Canonical myosin levers consist of one or several IQ-motifs to which light chains such as calmodulin bind. SAH-domains provide flexibility in length and stiffness to the myosin levers, and may be particularly suited for myosins working in crowded cellular environments. Although the function of the SAH-domains in human class-6 and class-10 myosins has well been characterised, the distribution of the SAH-domain in all myosin subfamilies and across the eukaryotic tree of life remained elusive. Here, we analysed the largest available myosin sequence dataset consisting of 7919 manually annotated myosin sequences from 938 species representing all major eukaryotic branches using the SAH-prediction algorithm of Waggawagga, a recently developed tool for the identification of SAH-domains. With this approach we identified SAH-domains in more than one third of the supposed 79 myosin subfamilies. Depending on the myosin class, the presence of SAH-domains can range from a few to almost all class members indicating complex patterns of independent and taxon-specific SAH-domain gain and loss.
• Journal Article

Model-based testing as a service ﻿

International Journal on Software Tools for Technology Transfer
The quality ofWeb services is an important factor for businesses that advertise or sell their services in the Internet. Failures can directly lead to fewer costumers or security problems. However, the testing of complexWeb services that are organized in service-oriented architectures is a difficult and complex problem. Model-based testing (MBT) is one solution to deal with the complexity of the testing. With MBT, testers do not define the tests directly, but rather specify the structure and behavior of the System Under Test using models. Then, a test strategy is used to derive test cases automatically from the models. However, MBT yields a large amount of tests for complex systems which require lots of resources for their execution, thereby limiting its potential. Within this article, we discuss how cloud computing can be used to provide the required resources for scaling up test campaigns with large amounts of test cases derived using MBT.
• Journal Article

Directional global three-part image decomposition ﻿

EURASIP Journal on Image and Video Processing 2016; 2016(1): Art. 12
We consider the task of image decomposition, and we introduce a new model coined directional global three-part decomposition (DG3PD) for solving it. As key ingredients of the DG3PD model, we introduce a discrete multi-directional total variation norm and a discrete multi-directional G-norm. Using these novel norms, the proposed discrete DG3PD model can decompose an image into two or three parts. Existing models for image decomposition by Vese and Osher (J. Sci. Comput. 19(1–3):553–572, 2003), by Aujol and Chambolle (Int. J. Comput. Vis. 63(1):85–104, 2005), by Starck et al. (IEEE Trans. Image Process. 14(10):1570–1582, 2005), and by Thai and Gottschlich are included as special cases in the new model. Decomposition of an image by DG3PD results in a cartoon image, a texture image, and a residual image. Advantages of the DG3PD model over existing ones lie in the properties enforced on the cartoon and texture images. The geometric objects in the cartoon image have a very smooth surface and sharp edges. The texture image yields oscillating patterns on a defined scale which are both smooth and sparse. Moreover, the DG3PD method achieves the goal of perfect reconstruction by summation of all components better than the other considered methods. Relevant applications of DG3PD are a novel way of image compression as well as feature extraction for applications such as latent fingerprint processing and optical character recognition.
• Journal Article

Modelling modal gating of ion channels with hierarchical Markov models ﻿

Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science 2016; 472(2192)
Many ion channels spontaneously switch between different levels of activity. Although this behaviour known as modal gating has been observed for a long time it is currently not well understood. Despite the fact that appropriately representing activity changes is essential for accurately capturing time course data from ion channels, systematic approaches for modelling modal gating are currently not available. In this paper, we develop a modular approach for building such a model in an iterative process. First, stochastic switching between modes and stochastic opening and closing within modes are represented in separate aggregated Markov models. Second, the continuous-time hierarchical Markov model, a new modelling framework proposed here, then enables us to combine these components so that in the integrated model both mode switching as well as the kinetics within modes are appropriately represented. A mathematical analysis reveals that the behaviour of the hierarchical Markov model naturally depends on the properties of its components. We also demonstrate how a hierarchical Markov model can be parametrized using experimental data and show that it provides a better representation than a previous model of the same dataset. Because evidence is increasing that modal gating reflects underlying molecular properties of the channel protein, it is likely that biophysical processes are better captured by our new approach than in earlier models.
• Journal Article

DOTmark - A Benchmark for Discrete Optimal Transport ﻿

IEEE Access p.1-12
The Wasserstein metric or earth mover’s distance (EMD) is a useful tool in statistics, computer science and engineering with many applications to biological or medical imaging, among others. Especially in the light of increasingly complex data, the computation of these distances via optimal transport is often the limiting factor. Inspired by this challenge, a variety of new approaches to optimal transport has been proposed in recent years and along with these new methods comes the need for a meaningful comparison. In this paper, we introduce a benchmark for discrete optimal transport, called DOTmark, which is designed to serve as a neutral collection of problems, where discrete optimal transport methods can be tested, compared to one another, and brought to their limits on large-scale instances. It consists of a variety of grayscale images, in various resolutions and classes, such as several types of randomly generated images, classical test images and real data from microscopy. Along with the DOTmark we present a survey and a performance test for a cross section of established methods ranging from more traditional algorithms, such as the transportation simplex, to recently developed approaches, such as the shielding neighborhood method, and including also a comparison with commercial solvers.
• Journal Article

Inferential Structure Determination of Chromosomes from Single-Cell Hi-C Data ﻿

PLOS Computational Biology 2016; 12(12): Art. e1005292
Chromosome conformation capture (3C) techniques have revealed many fascinating insights into the spatial organization of genomes. 3C methods typically provide information about chromosomal contacts in a large population of cells, which makes it difficult to draw conclusions about the three-dimensional organization of genomes in individual cells. Recently it became possible to study single cells with Hi-C, a genome-wide 3C variant, demonstrating a high cell-to-cell variability of genome organization. In principle, restraint-based modeling should allow us to infer the 3D structure of chromosomes from single-cell contact data, but suffers from the sparsity and low resolution of chromosomal contacts. To address these challenges, we adapt the Bayesian Inferential Structure Determination (ISD) framework, originally developed for NMR structure determination of proteins, to infer statistical ensembles of chromosome structures from single-cell data. Using ISD, we are able to compute structural error bars and estimate model parameters, thereby eliminating potential bias imposed by ad hoc parameter choices. We apply and compare different models for representing the chromatin fiber and for incorporating singe-cell contact information. Finally, we extend our approach to the analysis of diploid chromosome data.
• Journal Article

A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen–Shannon Divergence ﻿

Entropy 2016; 18(10)
The knowledge of protein-DNA interactions is essential to fully understand the molecular activities of life. Many research groups have developed various tools which are either structure- or sequence-based approaches to predict the DNA-binding residues in proteins. The structure-based methods usually achieve good results, but require the knowledge of the 3D structure of protein; while sequence-based methods can be applied to high-throughput of proteins, but require good features. In this study, we present a new information theoretic feature derived from Jensen–Shannon Divergence (JSD) between amino acid distribution of a site and the background distribution of non-binding sites. Our new feature indicates the difference of a certain site from a non-binding site, thus it is informative for detecting binding sites in proteins. We conduct the study with a five-fold cross validation of 263 proteins utilizing the Random Forest classifier. We evaluate the functionality of our new features by combining them with other popular existing features such as position-specific scoring matrix (PSSM), orthogonal binary vector (OBV), and secondary structure (SS). We notice that by adding our features, we can significantly boost the performance of Random Forest classifier, with a clear increment of sensitivity and Matthews correlation coefficient (MCC).
• Journal Article

Interpretable Multiclass Models for Corporate Credit Rating Capable of Expressing Doubt ﻿

Frontiers in Applied Mathematics and Statistics 2016; 2: Art. 16
Corporate credit rating is a process to classify commercial enterprises based on their creditworthiness. Machine learning algorithms can construct classification models, but in general they do not tend to be 100% accurate. Since they can be used as decision support for experts, interpretable models are desirable. Unfortunately, interpretable models are provided by only few machine learners. Furthermore, credit rating often is a multiclass problem with more than two rating classes. Due to this fact, multiclass classification is often achieved via meta-algorithms using multiple binary learners. However, most state-of-the-art meta-algorithms destroy the interpretability of binary models. In this study, we present Thresholder, a binary interpretable threshold-based disjunctive normal form (DNF) learning algorithm in addition to modifications of popular multiclass meta-algorithms which maintain the interpretability of our binary classifier. Furthermore, we present an approach to express doubt in the decision of our model. Performance and model size are compared with other interpretable approaches for learning DNFs (RIPPER) and decision trees (C4.5) as well as non-interpretable models like random forests, artificial neural networks, and support vector machines. We evaluate their performances on three real-life data sets divided into three rating classes. In this case study all threshold-based and interpretable models perform equally well and significantly better than other methods. Our new Thresholder algorithm builds the smallest models while its performance is as good as the best methods of our case study. Furthermore, Thresholder marks many potential misclassifications in advance with a doubt label without increasing the classification error.
• Journal Article

LPPS: A Distributed Cache Pushing Based K-Anonymity Location Privacy Preserving Scheme ﻿

Mobile Information Systems 2016; 2016 p.1-16
Recent years have witnessed the rapid growth of location-based services (LBSs) for mobile social network applications. To enable location-based services, mobile users are required to report their location information to the LBS servers and receive answers of location-based queries. Location privacy leak happens when such servers are compromised, which has been a primary concern for information security. To address this issue,we propose the Location Privacy Preservation Scheme (LPPS) based on distributed cache pushing. Unlike existing solutions, LPPS deploys distributed cache proxies to cover users mostly visited locations and proactively push cache content to mobile users, which can reduce the risk of leaking users’ location information.The proposed LPPS includes three major process. First, we propose an algorithm to find the optimal deployment of proxies to cover popular locations. Second, we present cache strategies for location-based queries based on the Markov chain model and propose update and replacement strategies for cache contentmaintenance. Third, we introduce a privacy protection scheme which is proved to achieve 𝑘-anonymity guarantee for location-based services. Extensive experiments illustrate that the proposed LPPS achieves decent service coverage ratio and cache hit ratio with lower communication overhead compared to existing solutions.
• Journal Article

Mapping molecules in scanning far-field fluorescence nanoscopy. ﻿

Nature communications 2015; 6
In fluorescence microscopy, the distribution of the emitting molecule number in space is usually obtained by dividing the measured fluorescence by that of a single emitter. However, the brightness of individual emitters may vary strongly in the sample or be inaccessible. Moreover, with increasing (super-) resolution, fewer molecules are found per pixel, making this approach unreliable. Here we map the distribution of molecules by exploiting the fact that a single molecule emits only a single photon at a time. Thus, by analysing the simultaneous arrival of multiple photons during confocal imaging, we can establish the number and local brightness of typically up to 20 molecules per confocal (diffraction sized) recording volume. Subsequent recording by stimulated emission depletion microscopy provides the distribution of the number of molecules with subdiffraction resolution. The method is applied to mapping the three-dimensional nanoscale organization of internalized transferrin receptors on human HEK293 cells.
• Journal Article

Using sparsity information for iterative phase retrieval in x-ray propagation imaging. ﻿

Optics express 2016-04-18; 24(8) p.8332-8343
For iterative phase retrieval algorithms in near field x-ray propagation imaging experiments with a single distance measurement, it is indispensable to have a strong constraint based on a priori information about the specimen; for example, information about the specimen's support. Recently, Loock and Plonka proposed to use the a priori information that the exit wave is sparsely represented in a certain directional representation system, a so-called shearlet system. In this work, we extend this approach to complex-valued signals by applying the new shearlet constraint to amplitude and phase separately. Further, we demonstrate its applicability to experimental data.
• Journal Article

Self-Consistent Sources for Integrable Equations Via Deformations of Binary Darboux Transformations ﻿

Letters in Mathematical Physics 2016; 106(8) p.1139-1179
We reveal the origin and structure of self-consistent source extensions of integrable equations from the perspective of binary Darboux transformations. They arise via a deformation of the potential that is central in this method. As examples, we obtain in particular matrix versions of self-consistent source extensions of the KdV, Boussinesq, sine-Gordon, nonlinear Schr¨odinger, KP, Davey–Stewartson, two-dimensional Toda lattice and discrete KP equation. We also recover a (2+1)-dimensional version of the Yajima– Oikawa system from a deformation of the pKP hierarchy. By construction, these systems are accompanied by a hetero binary Darboux transformation, which generates solutions of such a system from a solution of the source-free system and additionally solutions of an associated linear system and its adjoint. The essence of all this is encoded in universal equations in the framework of bidifferential calculus.
• Journal Article

Filter Design and Performance Evaluation for Fingerprint Image Segmentation. ﻿

PloS one 2016; 11(5): Art. e0154160
Fingerprint recognition plays an important role in many commercial applications and is used by millions of people every day, e.g. for unlocking mobile phones. Fingerprint image segmentation is typically the first processing step of most fingerprint algorithms and it divides an image into foreground, the region of interest, and background. Two types of error can occur during this step which both have a negative impact on the recognition performance: 'true' foreground can be labeled as background and features like minutiae can be lost, or conversely 'true' background can be misclassified as foreground and spurious features can be introduced. The contribution of this paper is threefold: firstly, we propose a novel factorized directional bandpass (FDB) segmentation method for texture extraction based on the directional Hilbert transform of a Butterworth bandpass (DHBB) filter interwoven with soft-thresholding. Secondly, we provide a manually marked ground truth segmentation for 10560 images as an evaluation benchmark. Thirdly, we conduct a systematic performance comparison between the FDB method and four of the most often cited fingerprint segmentation algorithms showing that the FDB segmentation method clearly outperforms these four widely used methods. The benchmark and the implementation of the FDB method are made publicly available.
• Journal Article

Regularized Newton methods for x-ray phase contrast and general imaging problems ﻿

Optics Express 2016; 24(6)
Like many other advanced imaging methods, x-ray phase contrast imaging and tomography require mathematical inversion of the observed data to obtain real-space information. While an accurate forward model describing the generally nonlinear image formation from a given object to the observations is often available, explicit inversion formulas are typically not known. Moreover, the measured data might be insufficient for stable image reconstruction, in which case it has to be complemented by suitable a priori information. In this work, regularized Newton methods are presented as a general framework for the solution of such ill-posed nonlinear imaging problems. For a proof of principle, the approach is applied to x-ray phase contrast imaging in the near-field propagation regime. Simultaneous recovery of the phase- and amplitude from a single near-field diffraction pattern without homogeneity constraints is demonstrated for the first time. The presented methods further permit all-at-once phase contrast tomography, i.e. simultaneous phase retrieval and tomographic inversion. We demonstrate the potential of this approach by three-dimensional imaging of a colloidal crystal at 95nm isotropic resolution.
• Journal Article

Rank Procedures for Repeated Measures with Missing Values ﻿

Sociological Methods & Research 2002; 30(3) p.367-393
• Journal Article

Artin's primitive root conjecture and a problem of Rohrlich ﻿

Mathematical Proceedings of the Cambridge Philosophical Society 2014; 157(01) p.79-99
Let $\mathbb{K}$ be a number field, Γ a finitely generated subgroup of $\mathbb{K}$*, for instance the unit group of $\mathbb{K}$, and κ>0. For an ideal $\mathfrak{a}$ of $\mathbb{K}$ let indΓ($\mathfrak{a}$]></alt-text></inline-graphic>) denote the multiplicative index of the reduction of &#x0393; in <inline-graphic name="S0305004114000206_inline3"><alt-text><![CDATA[$(\mathcal{O}_\mathbb{K}/\mathfrak{a})$* (whenever it makes sense). For a prime ideal $\mathfrak{p}$ of $\mathbb{K}$ and a positive integer γ let $\mathcal{I}_\gamma^\kappa(\mathfrak{p})$ be the average of ${ind}_{\langle a_1,\dots,a_\gamma\rangle}(\mathfrak{p})^\kappa$ over all tupels $(a_1,\dots,a_\gamma)\in{(\mathcal{O}_\mathbb{K}/\mathfrak{p})^*}^\gamma$. Motivated by a problem of Rohrlich we prove, partly conditionally on fairly standard hypotheses, lower bounds for $\sum_{\mathcal{N}{\mathfrak{a}\leq x}{ind}_{\Gamma}({\mathfrak{a})^\kappa$ and asymptotic formulae for $\sum_{\mathcal{N}\mathfrak{p} \leq x} {\mathcal{I}_{\gamma}^\kappa({\mathfrak{p})$.
• Journal Article

Ordering the space of finitely generated groups ﻿

Annales de l’institut Fourier 2015; 65(5) p.2091-2144
We consider the oriented graph whose vertices are isomorphism classes of finitely generated groups, with an edge from G to H if, for some generating set T in H and some sequence of generating sets S i in G, the marked balls of radius i in (G,S i ) and (H,T) coincide. We show that if a connected component of this graph contains at least one torsion-free nilpotent group G, then it consists of those groups which generate the same variety of groups as G. We show on the other hand that the first Grigorchuk group has infinite girth, and hence belongs to the same connected component as free groups. The arrows in the graph define a preorder on the set of isomorphism classes of finitely generated groups. We show that a partial order can be imbedded in this preorder if and only if it is realizable by subsets of a countable set under inclusion. We show that every countable group imbeds in a group of non-uniform exponential growth. In particular, there exist groups of non-uniform exponential growth that are not residually of subexponential growth and do not admit a uniform imbedding into Hilbert space.
• Journal Article

Factors of Influence on the Performance of a Short-Latency Non-Invasive Brain Switch: Evidence in Healthy Individuals and Implication for Motor Function Rehabilitation. ﻿

Frontiers in neuroscience 2015; 9: Art. 527
Brain-computer interfacing (BCI) has recently been applied as a rehabilitation approach for patients with motor disorders, such as stroke. In these closed-loop applications, a brain switch detects the motor intention from brain signals, e.g., scalp EEG, and triggers a neuroprosthetic device, either to deliver sensory feedback or to mimic real movements, thus re-establishing the compromised sensory-motor control loop and promoting neural plasticity. In this context, single trial detection of motor intention with short latency is a prerequisite. The performance of the event detection from EEG recordings is mainly determined by three factors: the type of motor imagery (e.g., repetitive, ballistic), the frequency band (or signal modality) used for discrimination (e.g., alpha, beta, gamma, and MRCP, i.e., movement-related cortical potential), and the processing technique (e.g., time-series analysis, sub-band power estimation). In this study, we investigated single trial EEG traces during movement imagination on healthy individuals, and provided a comprehensive analysis of the performance of a short-latency brain switch when varying these three factors. The morphological investigation showed a cross-subject consistency of a prolonged negative phase in MRCP, and a delayed beta rebound in sensory-motor rhythms during repetitive tasks. The detection performance had the greatest accuracy when using ballistic MRCP with time-series analysis. In this case, the true positive rate (TPR) was ~70% for a detection latency of ~200 ms. The results presented here are of practical relevance for designing BCI systems for motor function rehabilitation.
• Journal Article

Convolution Comparison Pattern: An Efficient Local Image Descriptor for Fingerprint Liveness Detection. ﻿

PloS one 2016; 11(2): Art. e0148552
We present a new type of local image descriptor which yields binary patterns from small image patches. For the application to fingerprint liveness detection, we achieve rotation invariant image patches by taking the fingerprint segmentation and orientation field into account. We compute the discrete cosine transform (DCT) for these rotation invariant patches and attain binary patterns by comparing pairs of two DCT coefficients. These patterns are summarized into one or more histograms per image. Each histogram comprises the relative frequencies of pattern occurrences. Multiple histograms are concatenated and the resulting feature vector is used for image classification. We name this novel type of descriptor convolution comparison pattern (CCP). Experimental results show the usefulness of the proposed CCP descriptor for fingerprint liveness detection. CCP outperforms other local image descriptors such as LBP, LPQ and WLD on the LivDet 2013 benchmark. The CCP descriptor is a general type of local image descriptor which we expect to prove useful in areas beyond fingerprint liveness detection such as biological and medical image processing, texture recognition, face recognition and iris recognition, liveness detection for face and iris images, and machine vision for surface inspection and material classification.