Pré-requis
  • DATA832
  • INFO831
  

 
Course description

This module is a continuation of the DATA832 and INFO831 modules in which the BA-ba of data science has been presented through the different paradigms of machine learning and exploratory data analysis. Experimental studies with basic algorithmic machinery have highlighted limitations of basic modeling tools and the need of using advanced methods.

A set of advanced methods, extending the fundamentals of machine learning, is presented in this module. Each approach improves the learning process by focusing on a particular aspect, such as reducing variance of decisions, dealing with non-linear problems, or learning from a very large number of examples with automatic feature extraction.

A conceptual presentation of different methods will be associated with some thoughts on their implementation and with experimentations based on case studies used in applied research.

 

Contents
  • ensemble methods (bagging, random forests, boosting)
  • vector support machines, kernel methods
  • deep learning
  • renforcement methods
  • time series, sequential patterns

Pré-requis
Tous les modules de la formation
  
 
Course description
The objective of this module is to give the methodological basis of
  1. a documentary research
  2. a bibliographic synthesis, i.e. report and critical analysis of a set of documents addressing the same issue, based on explicit criteria
 

Pré-requis
  • Systèmes distribués à large échelle (INFO 833)
  • Bases de données distribuées (INFO 834)
  

 
Course description

This final project builds on previous projects (PROJ 631, 831, 931) and considers developing a complete process and system, from acquiring and curating data to analyzing and visualizing them. The main difference comparing to previous project is this one no longer remains on a single (even high-end) machine. It is required to distribute data and computation on a set of machines organised into a cluster (in a Cloud).

The deliverable for this project is a Virtual Machine or a Docker containing all the tools (in-house or open source ones) to reuse the process.