APPLIED STATISTICS 2

APPLIED STATISTICS 2

_
iten
Code
52509
ACADEMIC YEAR
2018/2019
CREDITS
6 credits during the 3nd year of 8766 Mathematical Statistics and Data Management (L-35) GENOVA
SCIENTIFIC DISCIPLINARY SECTOR
SECS-S/01
TEACHING LOCATION
GENOVA (Mathematical Statistics and Data Management)
semester
2° Semester
Teaching materials

OVERVIEW

Experts introduce or present advances on statistical techniques that they use in their work by illustrating their applications through concrete examples.

AIMS AND CONTENT

LEARNING OUTCOMES

Provide statistical tools relevant to specific applications and the experience of on-field experts. 

AIMS AND LEARNING OUTCOMES

 

Pattern recognition and applications (24 hours of in-presence lectures)
The module introduces the fundamental concepts and algorithms of statistical pattern recognition, with a focus on their industrial applications (e.g. predictive maintenance, process optimization, quality control…), their development cycle and performance evaluation. Examples are mainly from Computer Vision that often provides the data sets on which methods of pattern recognition are applied. The module starts from statistical decision theory and parametric estimation and provides a quick review of the different viewpoints and conceptual approaches to the subject matter. The common reference is in the ability to realise industrial systems capable of making statistically optimal decisions on the basis of experience. In the light of the Industry 4.0 plan, which should lead to a strong renewal in production processes, it is considered useful to give greater visibility to this area of use of statistical skills.

Measurement models in psychometrics (14 hours of in-presence lectures)
The course introduces to statistical issues in psychometric theory and to the use of statistical software (R) for carrying out basic psychometric analyses.

Demography (4 hours of in-presence lectures)
To illustrate via a complex example the issues related to the communication of demographic data to the general population.

Further seminarial activities (non evaluated) could be organised each year. Usually they are presented by data scientists who work in applied contexts such as companies, consumer companies, public bodies. 

Teaching methods

Combination of traditional lectures and lab sessions with the softwaresMatlab and R.

SYLLABUS/CONTENT

 

Pattern recognition and applications
After an overview on pattern recognition and the criteria for applications, the following topics are addressed.

  • Bayesian decision theory. Maximum a posterior probability. Classification and regression. Naïve Bayes. Construction of optimal classifier. Parameter estimation. Performance evaluation. Cross-validation.
  • General statistical classifiers. Gaussian mixtures and EM algorithm. Outlier detection. Some simple non parametric techniques. Introduction to Bayesian networks and inference on graphs.
  • Dimensionality reduction. Feature selection. Genetic methods. Linear transformation of the sample space: PCA/LDA/ICA. Non linear maps (t-SNE).
  • Decision trees. The CART method. Bagging and random forest. Boosting. Statistical modelling with trees.
  • Neural nets for classification. Multi-strata models and learning algorithm. Devising a neural classifier. Neural nets as generalised approximators. Introduction to deep learning (convolutional networks and stacked autoencoders)

Theoretical lectures are interwinedwith examples of applications such as

  • Optical character recognition. Construction of classifiers with different levels of complexity (from Naive Bayes to convolutional network) for the recognition of handwritten or printed text.
  • Automatic counting systems and event detectors. Imagine analysis for the detection of faces, people, vehicles, … Identification of key features and definition of an optimal binary acceptance test via boosting techniques.
  • Statistical modelling of complex machineries. Definition of non-linear input-output relationships for the forecasting of a target variable (e.g. energy consumption) from instrumental data with random forest and neural nets.
  • Quality control and predictive maintenance. Probabilistic distribution of sensor data andanomaly/outlier detection. Estimation of the system residual lifetime (TTF).

The application will be illustrate with the aid of suitable Matlab toolbox and original data during guided hands-onsessions.

Psychometrics
Classical test theory
Psychological variables or constructs
Definition of the content domain of a construct and its operationalizations
Measurement models in psychology: reflective indicators models and formative indicators models
Item analysis and reliability
Exploratory factor analysis
Confirmatory factor analysis
Structural Equation Models
Applications in R (packages 'psych', 'lavaan' e 'semPlot') will be shown.

Demography

The course is based on the careful reading and analysis of the volume Tutto quello che non vi hanno mai detto sull'immigrazione (2015, Laterza) by Gianpiero Dalla Zuanna and Stefano Allievi. Collecting, analyzing and presenting data for helping the society tot ransform opportunities into new realities.

 

RECOMMENDED READING/BIBLIOGRAPHY

Pattern recognition and applications
Handouts available at the web site http://www.onairweb.com/corsoPR/
Further reading:
R.Duda, P.Hart, D.Stork, Pattern Classifcation, Wiley, (2001)
S.Theodoridis, K.Koutroumbas, Pattern Recognition, Academic Press, (2006)
C.Bishop, Pattern Recognition and Machine Learning, Springer, (2007)
S.Theodoridis, Machine Learning, a Bayesian and Optimization Perspective, Academic Press, (2015)

Psychometry
Rust, J. & Golombok, S. (2009). Modern psychometrics, 3rd ed. Hove: Routledge (chapters 1, 2, 3, 4, and 7). 
Handouts and other teaching material (e.g., R codes) will be shared online.  

Demografia
Gianpiero Dalla Zuanna e Stefano Allievi (2015). Tutto quello che non vi hanno mai detto sull'immigrazione, Laterza.

TEACHERS AND EXAM BOARD

Ricevimento: By appointment arranged by email with Luca Oneto luca.oneto@unige.it and Fabrizio Malfanti <fabrizio.malfanti@intelligrate.it> For organizational issues contact by email Eva Riccomagno <riccomagno@dima.unige.it>  

Ricevimento: Tuesdays, 12pm-1pm, Dipartimento di Scienze della Formazione, floor 4, room 4A3, Corso A. Podestà, 2, 16128 Genova. If the teacher is not available, this will be notified as soon as possible on the Aulaweb website and on the student online forum. The teacher cannot guarantee his availability for students outside office hours. However, students that cannot meet him during office hours can make an appointment in another date/time by e-mail. A Skype call can also be scheduled on e-mail request Teachers' contacts Phone +39  010 209 53709 E-mail: carlo.chiorri[chioc]unige.it or carlo.chiorri[chioc]gmail.com Skype: chiorri.psicometria (by appointment only)

Exam Board

EVA RICCOMAGNO (President)

CARLO CHIORRI

LESSONS

Teaching methods

Combination of traditional lectures and lab sessions with the softwaresMatlab and R.

LESSONS START

The class will start according to the academic calendar.

 

 

ORARI

L'orario di tutti gli insegnamenti è consultabile su EasyAcademy.

Vedi anche:

APPLIED STATISTICS 2

EXAMS

Exam description

Pattern recognition and applications
Written exam with multiple-choice questions and its discussion

Psychometrics
Written exam and its discussion.

Demography
Written exam with multiple choice or open questions

The final mark is the weighted average of the marks of the three parts. The weights are proportional to the hours of classroom lectures.

Assessment methods

Pattern recognition and applications
The exam consists of 25 questions with multiple-choice answers, regarding all topics discussed during the course. Answers can be numeric, true/false and might require elementary calculations. Lecture notes or other material are not allowed. A pocket calculator may be useful but not essential. The duration is 45 minutes. The correction takes place just after the exam. It is possible to motivate some answers by providing a suitable reasoning scheme.

Psychometrics: Students will be presented with the R output of some statistical analyses carried out on real data. In the written part of the exam the ability of the students to apply what they have learnt through the lectures and the course materials will be tested, as they will be asked to interpret and comment the results and detect flaws of the statistical analyses. In the oral discussion issues with the answers to the written exam will be reviewed and discussed and knowledge of psychometric theory will be tested.

Demography: The acquired stills to identify in a complex text specific information and data as well as the supporting statistical analysis underlying them.

FURTHER INFORMATION

Web pages: http://www.onairweb.com/corsoPR/ https://www.dropbox.com/s/groq642v7rbviha/Lezioni%20SMID%202016.zip?dl=0

Prerequisites: Applied Statistics 1

Attendance is highly recommended.