This module is a continuation of the DATA832 and INFO831 modules in which the BA-ba of data science has been presented through the different paradigms of machine learning and exploratory data analysis. Experimental studies with basic algorithmic machinery have highlighted limitations of basic modeling tools and the need of using advanced methods.
A set of advanced methods, extending the fundamentals of machine learning, is presented in this module. Each approach improves the learning process by focusing on a particular aspect, such as reducing variance of decisions, dealing with non-linear problems, or learning from a very large number of examples with automatic feature extraction.
A conceptual presentation of different methods will be associated with some thoughts on their implementation and with experimentations based on case studies used in applied research.
- ensemble methods (bagging, random forests, boosting)
- vector support machines, kernel methods
- deep learning
- renforcement methods
- time series, sequential patterns
- a documentary research
- a bibliographic synthesis, i.e. report and critical analysis of a set of documents addressing the same issue, based on explicit criteria
- Enseignant (auteur): Galichet Sylvie
- Systèmes distribués à large échelle (INFO 833)
- Bases de données distribuées (INFO 834)
This final project builds on previous projects (PROJ 631, 831, 931) and considers developing a complete process and system, from acquiring and curating data to analyzing and visualizing them. The main difference comparing to previous project is this one no longer remains on a single (even high-end) machine. It is required to distribute data and computation on a set of machines organised into a cluster (in a Cloud).
The deliverable for this project is a Virtual Machine or a Docker containing all the tools (in-house or open source ones) to reuse the process.
- Enseignant (auteur): Huget Marc-Philippe