DATA MINING

DATA MINING

_
iten
Code
52507
ACADEMIC YEAR
2018/2019
CREDITS
6 credits during the 3nd year of 8766 Mathematical Statistics and Data Management (L-35) GENOVA
SCIENTIFIC DISCIPLINARY SECTOR
SECS-S/01
LANGUAGE
Italian
TEACHING LOCATION
GENOVA (Mathematical Statistics and Data Management)
semester
2° Semester
Teaching materials

OVERVIEW

Provide the students with the basic skills for extracting knowledge from large data sets.

AIMS AND CONTENT

LEARNING OUTCOMES

Develop the basic skills for extracting knowledge and knowledge from large data sets, in particular by forming an

  • understanding of the value of data mining in solving real-world problems
  • understanding of foundational concepts underlying data mining
  • understanding of algorithms commonly used in data mining tools
  • ability to apply data mining tools to real-world problems

AIMS AND LEARNING OUTCOMES

At the end of the course students will

  • be able to understand and handle the main concepts and techniques of data mining
  • be able to apply autonomously the main techniques of data mining to solve real-world problems
  • to develop further knowledge about data mining techniques and applications

Teaching methods

Combination of traditional lectures and lab sessions

SYLLABUS/CONTENT

First part: introduction to aata mining and applications in fraud detection
Introduction to Data Mining, Data science and big data analytics
Main techniques
The Data Mining Process - CRISP
Seven Class of Algorithms
            Supervised Learning – Classification
            Unsupervised Learnimg – Clustering
            Outliers detection
            Regression
            Reinforced Learning
            Ranking
            Deep Learning
Top ten data mining algorithms
Examples and application using WEKA
Application to marketing, finance and medicine
Big Data and Hadoop
The NOSql paradigm

 

Second part: Machine Learning Algorithms for Data mining
Introduction to Data Mining and Machine Learning.
Taxonomy of the Data Mining problems
Statistical Inference
Support Vector Machines (extension to kernels)
Support Vector Regression (extension to kernels)
K-means and Spectral Clustering
Decision Trees and Random Forests
Model Selection and Error Estimation

RECOMMENDED READING/BIBLIOGRAPHY

  • Aggarwal, C- C. Data mining: the textbook. Springer, 2015.
  • Shalev-Shwartz, S., and Shai B. D. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
  • Ian H. Witten, Eibe Frank, Mark A. Hall (2000). Data Mining: Practical Machine Learning Tools and Techniques (The Morgan Kaufmann Series in Data Management Systems) ISBN-13: 978-0123748560. Disponibile presso il CSB di Ingegneria 006.312 WIT --> disponibile anche online a http://www.sciencedirect.com/science/book/9780123748560
  • Clifton Phua, Vincent Lee, Kate Smith and Ross Gayler (2005). A Comprehensive Survey of Data Mining-based Fraud Detection Research, Computing Research Repository, abs/1009.6119. Disponibile online --> http://arxiv.org/abs/1009.6119
  • N. Cristianini, J. Shawe-Taylor, An introduction to support Vector Machine and other kernel-based learning methods, Cambridge University Press, 2006 disponibile ING e ECO
  • A. Ng, M. Jordan, Y. Weiss, On spectral clustering: Analysis and an algorithm, NIPS 2001. --> disponibile anche online a  http://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm.pdf
  • Dispense/Handouts

TEACHERS AND EXAM BOARD

Ricevimento: By appointment arranged by email with Luca Oneto luca.oneto@unige.it and Fabrizio Malfanti <fabrizio.malfanti@intelligrate.it> For organizational issues contact by email Eva Riccomagno <riccomagno@dima.unige.it>  

Exam Board

FABRIZIO MALFANTI (President)

EVA RICCOMAGNO (President)

LUCA ONETO

LESSONS

Teaching methods

Combination of traditional lectures and lab sessions

LESSONS START

The class will start according to the academic calendar.

ORARI

L'orario di tutti gli insegnamenti è consultabile su EasyAcademy.

Vedi anche:

DATA MINING

EXAMS

Exam description

To take the exam, you must sign up online.
The examination of the first part consists of the discussion of a group project on a topic agreed with the lecturer and of a written examination on which the oral examination can be based.
The examination of the second part consists of the discussion of a project on a topic agreed with the lecturer and developed autonomously by the student.
The final mark is the weighted average of the marks of the two parts with weights the number of ECTS of each part, namely 3 ECTS for each part.

Assessment methods

The exam will check if the student has learned the methodologies and techniques for extracting knowledge from a big set of data through a small project which requires the solution of a real world data mining problem.

FURTHER INFORMATION

By appointment arranged by email with Luca Oneto luca.oneto@unige.it and Fabrizio Malfanti <fabrizio.malfanti@intelligrate.it>
For organizational issues contact by email Eva Riccomagno <riccomagno@dima.unige.it>

The web page of the second part of the course is https://sites.google.com/view/lucaoneto/teaching/dm-smid