PROGRAM

9h-9h30 Reception in Amphi 2

 

9h30-10h15 Tutorial

Speaker: Mathieu Fontaine

Title: TAD-GAN : a robust unsupervised time anomaly detection

Abstract:

Anomaly detection is an important task in many real-world applications, such as network intrusion detection, cybersecurity, and predictive maintenance. Deep learning-based methods, particularly generative adversarial networks (GANs), have shown good results for detecting anomalies in time series data. We propose to introduce in that presentation the Time anomaly detection GAN (TadGAN) model which is a GAN-based model for detecting anomalies in time series. The model uses two time-aware critics (aka. discriminators) that takes into account the order of the data points and their temporal relationships (in the real and latent space). TadGAN also employs a residual network in the generator that helps to capture the long-term dependencies in the data. Moreover, a cycle consistency loss is taken into account for an accurate time-series data reconstruction.

References:

[1] Geiger, Alexander, et al. “Tadgan: Time series anomaly detection using generative adversarial networks.” 2020 IEEE International Conference on Big Data (Big Data). IEEE, 2020.

[2] Zhou, Bin, et al. “BeatGAN: Anomalous Rhythm Detection using Adversarially Generated Time Series.” IJCAI. Vol. 2019. 2019.

[3] Zhu, Jun-Yan, et al. “Unpaired image-to-image translation using cycle-consistent adversarial networks.” Proceedings of the IEEE international conference on computer vision. 2017.

 

10h30-11h Short Talk

Speaker: Pavlo Mozharovskyi

Title: Anomaly detection using data depth:  functional setting

Abstract:

Anomaly detection is a branch of machine learning which aims at identifying observations that exhibit abnormal behavior. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification, isolation and explanation constitute a necessary task in almost any branch of industry. Since contemporary technological level allows for recording large amounts of data at potentially high frequency, the question of anomaly detection in the functional setting becomes increasingly important. This is amplified by the richness of the infinitely dimensional space and non-negligible occurrence probability of unexpected types of anomalies, i.e. those not present in the training sample. This multitude of abnormalities demands a non-parametric methodology for their identification, while robustness requirement suggests data treatment directly in the functional space (different to first projecting it onto a finite-dimensional basis). This talk advocates that—in a number of practical situations—functional data depth appears to be an efficient tool for anomaly detection. Data-depth-based methodology treats observations directly as functions, thus transferring depth’s robustness properties, being crucial for the anomaly detection task, to functional data. Though still retaining computational challenges, today, data depth methodology includes a number of functional depth notions that possess (together with robustness) such attractive properties as non-parametricity and desired invariances, with functional halfspace, area-of-the-convex-hull, or curve depths being only a few examples. A natural question arises: which depth notions are better suited for the functional anomaly detection problem at hand? [4] suggest a taxonomy of abnormal observations, but complexity of the functional space (i) embroils attribution of real-data anomalies to pre-defined types and (ii) generates a multitude of case-specific sorts of anomalies; see [7] for a detailed benchmark study that involves real data. This work is thus an attempt to provide practically important insights into choice and application of depth notions by show-casing their usefulness for anomaly detection in different settings and by benchmarking with the state-of-the-art methods. Simulated and real data explored in the experiments here are expected to attract attention and gain applicant’s trust to the depth methodology.

References:

[1] Chandola, V., Banerjee, A. & Kumar, V. (2009). Anomaly detection: A survey.In ACM Computing Surveys.

[2] Claeskens, G., Hubert, M., Slaets, L. & Vakili, K. (2014). Multivariate functional halfspace depth. In Journal of the American Statistical Association.

[3] Gijbels, I. & Nagy, S. (2017). On a general definition of depth for functional data. In Statistical Science.

[4] Hubert, M., Rousseeuw, P.J. & Segaert, P. (2015). Multivariate functional outlier detection. In Statistical Methods & Applications.

[5] Lafaye De Micheaux, P., Mozharovskyi, P. & Vimond, M. (2021). Depth for curve data and applications. In Journal of the American Statistical Association.

[6] Nieto-Reyes, A. & Battey, H. (2016). A topologically valid definition of depth for functional data. In Statistical Science

[7] Staerman, G., Adjakossa, E., Mozharovskyi, P., Hofer, V., Sen Gupta, J. & Clémençon, S. (2022). Functional anomaly detection: a benchmark study. In International Journal of Data Science and Analytics.

[8] Staerman, G., Mozharovskyi, P. & Clémençon, S. (2020). The area of the convex hull of sampled curves: a robust functional statistical depth measure. In the Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics.

 

11h-11h30 Short Talk

Speaker: Salah Zaiem

Title: Automatic data augmentation for training and adaptation of speech self-supervised models

Abstract:

In this talk, centered around two recent works, we present a conditional-independence based method that allows to automatically select data augmentation policies. This method has shown beneficial in two settings related to self-supervised learning for speech; first, for data augmentation selection and parameterization in the pre training phase in the case of contrastive self-supervised speech representation learning. Second, during the downstream fine-tuning phase, allowing for better domain adaptation when the downstream task exhibits different acoustic settings than the pretraining corpora.

References:

[1] Zaiem, S., Parcollet, T., & Essid, S., Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning. Interspeech 2022

[2] Hsu, W.-N., Sriram, A., Baevski, A., Likhomanenko, T., Xu, Q., Pratap, V., Kahn, J., Lee, A., Collobert, R., Synnaeve, G., & Auli, M. Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training. Interspeech 2021

[3] Zaiem, S., Parcollet, T., & Essid, S.,  Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations 2022

 

11h30-12h Discussion – Use Cases & Perspectives