# MATHEMATICAL STATISTICS

7 credits during the 1st year of 9011 Mathematics (LM-40) GENOVA

7 credits during the 2nd year of 9011 Mathematics (LM-40) GENOVA

**Mathematical Statistics and Data Management 8766 (coorte 2018/2019)**- PROBABILITY 87081

**Mathematical Statistics and Data Management 8766 (coorte 2016/2017)**- PROBABILITY 87081

**Mathematical Statistics and Data Management 8766 (coorte 2017/2018)**- PROBABILITY 87081

OVERVIEW

An introduction to the classical theory of statistical models (model identification and estimation, parametric and not parametric models, exponential models), point estimation (moment method, likelihood method and invariant estimators) and methods of evaluating estimators (UMVUE estimators, Fisher information, Cramer-Rao inequality).

In the second part, the above theory is applied to a class of statistical models fundamental for the applications. Lab sessions with the softwares SAS and R are integral part of the course.

## AIMS AND CONTENT

LEARNING OUTCOMES

To formalise estimation problems (parametric and non-parametric) and statistical hypothesis testing in a rigorous mathematical framework, to formulate and apply appropriate regression models to various typologies of data sets.

AIMS AND LEARNING OUTCOMES

At the end of the course students will be able to

- recognise estimation problems (both parametric and non parametric) in applied contexts
- formulate them in a rigorous mathematical framework
- identify suitable regression models for data analysis, to analyse the data with advanced software, to summarise results of the analysis in a report including the interpretation of the results of the analysis and of their reliability.

Teaching methods

Classroom lectures and exercise sessions. In the second part there will be computer laboratory sessions whose aim is to practice the application of the theoretical models learnt during classroom lectures, to describe and predict a phenomenon of interests based on real case studies and data sets. During the lab sessions the student will be able to verify his/her level on understanding of the theory and its application.

SYLLABUS/CONTENT

**Program of the first part of the course:**

*Review of essential probability* including the notion of conditional probability and multivariate normal distribution.

*Statistical models and statistics*|: the ideas of data sample and of statistical model, identifiability and regular models, the exponential family. Statistics and their distributions. Sufficient, minimal and sufficient, ancillary, complete statistics. The lemma of Neyman-Fisher. The Basu theorem.

*Point estimators and their properties: *methods to find point estimators: moment methods, least square method, maximum likelihood method, invariant estimators. Methods to evaluate estimators: theorems of Rao-Blackwell and Lehmann-Scheffé. UMVU estimators. Expected Fisher information, Cramer-Rao inequality and efficient estimators.

*Statistical hypothesis testing: *theorem of Neyman-Pearson for simple hypothesis, likelihood ration test.

*Introduction to Bayesian statistics*: prior and posterior probability distributions, conjugate priors, improper and flat priors, comparison with the frequentist approach to estimation.

At most one of the last two topics is part of the course for each given year.

**Program of the second part of the course:**

*General linear models.* ANOVA: crossed and nested factors; unbalanced data. Overparametrised models: reparametrization and generalised inverse function: theoretical considerations and practical implications. Multivariate linear regression models and models for repeated measures.

*Generalised linear model.* Exponential family. Link function. Models for categorical data (binomial, multinomial and Poisson models). Iterative methods for coefficients’ estimation: Newton-Raphson, scoring. Asymptotic distributions for likelihood based statistics. Statistical hypothesis testing and goodness of fit criteria: deviance, chi-squared. Residuals. Tests and confidence intervals for (subsets of) the models parameters. Odds-ratio and log-odd ratios. Models for ordinal data and contingency tables.

*Lab sessions *based on the softwares SAS and R.

RECOMMENDED READING/BIBLIOGRAPHY

**Prima Parte/First part: **

**Testi consigliati/Text books:**

G. Casella e R.L. Berger, Statistical inference, Wadsworth 62-2002-02 62-2002-09

D. A. Freedman, Statistical Models, Theory and Practice, Cambridge 62-2009-05

L. Pace e A. Salvan, Teoria della statistica, CEDAM 62-1996-01

M. Gasparini, Modelli probabilistici e statistici, CLUT 60-2006-08

D. Dacunha-Castelle e M. Duflo, Probabilites et Statistiques, Masson 60-1982-18/19/26 e 60-1983-22/23/24

A.C. Davison, Statistical Models, Cambridge University Press, Cambridge, 2003

**Letture consigliate/Suggested reading:**

David J. Hand, A very short introduction to Statistics, Oxford 62-2008-05

L. Wasserman. All of Statistics, Springer

J. Protter, Probability Essentials, Springer 60-2004-09

S.L. Lauritzen, Graphical models, Oxford University press 62-1996-14

D. Williams, Probability with Martingales, Cambridge Mathematical Textbooks, 1991

Appunti distribuiti a lezione/Handouts

**Seconda Parte/Second part: **

Dobson A. J. (2001).* An Introduction to Generalized Linear Models *2nd Edition. Chapman and Hall.

Rogantin M.P. (2010).* Modelli lineari generali e generalizzati.* In rete.

## TEACHERS AND EXAM BOARD

**Ricevimento:** By appointment arranged by email with Luca Oneto luca.oneto@unige.it and Fabrizio Malfanti <fabrizio.malfanti@intelligrate.it>
For organizational issues contact by email Eva Riccomagno <riccomagno@dima.unige.it>

Exam Board

EVA RICCOMAGNO (President)

MARIA PIERA ROGANTIN (President)

EMANUELA SASSO

## LESSONS

Teaching methods

Classroom lectures and exercise sessions. In the second part there will be computer laboratory sessions whose aim is to practice the application of the theoretical models learnt during classroom lectures, to describe and predict a phenomenon of interests based on real case studies and data sets. During the lab sessions the student will be able to verify his/her level on understanding of the theory and its application.

LESSONS START

The class will start according to the academic calendar.

ORARI

## EXAMS

Exam description

The two parts of the course are examined together. There is an oral and a written exam. The mark of each single question and the available time (usually three hours) are on the exam paper.

Assessment methods

In the written exam there are three or four exercises. One of the exercises consists of commenting the output of an analysis done with statistical software. Past exams with solutions are available on the websites of the two parts of the course. The oral exam consists of questions on both parts of the course. The course work done during the lab sessions might be subject of the oral exam (thus bring with you at the exams that course work).

## FURTHER INFORMATION

**Pagina web dell'insegnamento:**

Prima parte: http://www.dima.unige.it/~riccomag/Teaching/StatisticaMatematica.html

Seconda parte: http://www.dima.unige.it/~rogantin/ModStat/

**Prerequisiti Prima Parte:** Analisi Matematica I e 2. Calcolo delle Probabilità .

**Prerequisiti Seconda Parte:** Argomenti di Statistica inferenziale e della prima parte di Statistica Matematica (quest'ultima svolta in parallelo) con corrispondenti prerequisiti.

**Web pages of the couse are**

for the first part: http://www.dima.unige.it/~riccomag/Teaching/StatisticaMatematica.html

for the second part: http://www.dima.unige.it/~rogantin/ModStat/

**Prerequisite for the first part**: Mathematical Analysis 1 and 2, Probability

**Prerequisite for the second part: **Statistical inference and in parallel the first part of Mathematical Statistics.