International Workshop on Multi-Target Prediction

ECML/PKDD 2014, Nancy, France on September 15th, 2014


Invited speakers.

Cédric Archambeau, Amazon Research, Germany

Multi-task learning: a Bayesian approach

I will discuss a Bayesian model for multi-task regression and classification. The model is able to capture correlations between a potentially very large number of tasks, while being sparse in the features (e.g., to facilitate interpretation) . The model is based on a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. The amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Finally, I will discuss how this model was used in the context of contact centres to understand the factors that were impacting customer satisfaction and driving performance

Charles Elkan, University of California at San Diego, USA

Massive, Sparse, Efficient Multilabel Learning

Many real-world applications of machine learning have multilabel classification at their core. This talk will present progress towards a multilabel learning method that can handle 10^7 training examples, 10^6 features, and 10^5 labels on a single workstation. A sparse linear model is learned for each label simultaneously by stochastic gradient descent with L2 and L1 regularization. Tractability is achieved through careful use of sparse data structures, and speed is achieved by using the latest stochastic gradient methods that do variance reduction. Both theoretically and practically, these methods achieve order-of-magnitude faster convergence than Adagrad. We have extended them to handle non-differentiable L1 regularization. We show experimental results on classifying biomedical articles into 26,853 scientific categories. [Joint work with Galen Andrew, ML intern at Amazon.]

Pierre Geurts, University of Liège, Belgium

Supervised inference of biological networks with single, multiple, and kernelized output trees

Networks provide a natural representation of molecular biology knowledge, in particular to model relationships between biological entities such as genes, proteins, drugs, or diseases. Because of the effort, the cost, or the lack of the experiments necessary for the elucidation of these networks, computational approaches for network inference have been frequently investigated in the literature.

This talk focuses on supervised network inference methods that rely on supervised learning techniques to infer the network from a training sample of known interacting and possibly non-interacting entities and additional measurement data. We first formalize the problem as a supervised classification problem on pairs of objects and briefly discuss several important issues in the application and validation of supervised network inference methods, due to the need to classify pairs or to the nature of biological networks. We then present several generic approaches to address this problem that exploit either standard classification methods, multi-label methods, or kernelized output methods. We show that tree-based ensemble methods can be adapted to all these settings and therefore provide an attractive learning toolbox for network inference. We further highlight their interpretability, through single tree structure and ensemble-derived feature importance scores, and draw interesting connections with clustering techniques. Experiments will be reported on various biological networks.

Christoph Lampert, IST, Austria

Predicting Multiple Structured Outputs

The problem of multi-label prediction for structured output sets, i.e. predicting multiple structured outputs from one input, occurs in several practical applications, such as object detection in images, or secondary structure prediction in computational biology. Most conventional multi-label classification techniques are not applicable in this situation, however, because they require explicitly enumerating all possible labels, which is impractical for exponentially sized structured label spaces.

As an alternative, I will discuss a maximum-margin formulation for multi-label structured prediction that remains computationally tractable even for exponentially large label sets, and that shares many beneficial properties with single-label structured prediction approaches, in particular the formulation as a convex optimization problem, efficient working set training, and PAC-Bayesian generalization bounds.