Axis 1 Building predictive analytics on time series and data streams

Nathan Huet (PhD, funded by the Chair): Functional Extremes or Karhunen-Loève Expansion for Extreme Data

Abstract: Data in many fields increasingly come to us with functional structures. We propose here a framework with the goal of building an efficient anomaly detection algorithm for functional data based on extreme assumptions (e.g. regular variation). To achieve this: extremes and anomalies in functional spaces have to be characterized, theoretical guarantees need to be developed, a suitable representation of finite moderate dimension for functional data is necessary.

See the presentation

Dimitri Bouche (PhD, funded by the Chair): Wind power predictions from nowcasts to 4-hour forecasts: a learning approach with variable selection

Abstract: We study the prediction of short term wind speed and wind power (every 10 minutes up to 4 hours ahead). Accurate forecasts for those quantities are crucial to mitigate the negative effects of wind farms’ intermittent production on energy systems and markets. For those time scales, outputs of numerical weather prediction models are usually overlooked even though they should provide valuable information on higher scales dynamics. In this work, we combine those outputs with local observations using machine learning. So as to make the results usable for practitioners, we focus on simple and well known methods which can handle a high volume of data. We study first variable selection through two simple techniques, a linear one and a nonlinear one. Then we exploit those results to forecast wind speed and wind power still with an emphasis on linear models versus nonlinear ones. For the wind power prediction, we also compare the indirect approach (wind speed predictions passed through a power curve) and the indirect one (directly predict wind power).

See the presentation

Emilia Siviero (PhD, financé par la Chaire): A Statistical Learning View of Simple Kriging

Abstract: In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence  tructure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task, the flagship problem in Geostatistics: the values of a square integrable random field X = {Xs}s∈S,⊂ R2, with unknown covariance structure are to be predicted with minimum quadratic risk, based upon observing a single realization of the spatial process at a finite number of locations s1, . . . , sn in S. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to  the non i.i.d. nature of the spatial data Xs1 , . . . , Xsn involved. In this article, nonasymptotic bounds of order OP(1/n) are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes observed at locations forming a regular grid. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments and hopefully pave the way for further developments in statistical learning based on spatial data.
Mots-clés: Geostatistics, Statistical Learning, Kriging, Covariance Estimation

See the presentation