Un cours en deux parties présenté par Pavlo Mozharovskyi, sous format hybride.
[ENGLISH]
Anomaly detection (Chandola et al., 2009) is a branch of machine learning which aims at identifying observations that exhibit abnormal behavior. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification, isolation and explanation constitute an important task in almost any branch of industry and science. When the data are presented in a form of a table that contains properties of individuals (a typical structure of a data base), multivariate anomaly detection (Rousseeuw & Hubert, 2018) methods should be employed. If the data are functions of an argument ,e.g., time (such as time series), projection on a multivariate sub-basis or functional anomaly detection methods (Hubert et al., 2015) can be in use.
For both multivariate and functional anomaly detection the following general steps are in place: first observations are ordered with respect to their normality/outlyingness, and then an application-specific threshold is to be chosen to distinguish abnormal observations. Thus defining appropriate ordering is the main task of anomaly detection methods. This ordering can of course be a direct extension of the probability density (Breunig et al., 2000; Polonik, 1997), but such an approach quickly suffer from the curse of dimensionality, which is rather a rule than exception in contemporary data analysis. For this reason, recently non-parametric ordering methods (Schölkopf et al., 2001; Liu et al., 2008), and in particular the notion of data depth (Zuo & Sefling, 2000; Mosler, 2013) increasingly attract attention.
Among non-parametric orderings, data depth occupies today a special place. Given an observation, it measures how typical (or deep) this observation is with respect to other available observations of the same nature. Multivariate data depth possesses such attractive properties as robustness and affine invariance, which can be further extended to functional depth (Gijbels & Nagy, 2017). In the current tutorial, this methodology is addressed in two parts.
Part II: Further, functional (or time-series) framework is treated. In particular, integrated (Claeskens et al., 2014) and curve (Lafaye De Micheaux et al., 2020) functional depths, and functional isolation forest Staerman et al. (2019) are explained, where the focus is made on a real-world applications such as hurricane tracks or brain imaging.
Keywords: Anomaly detection, machine learning, data depth, multivariate ordering, functional ordering, robustness, outliers, ranking, computational statistics, time series.