Statistical Modelling course


Third year course of the B.Sc. course in Artificial Intelligence (Universities of Milano-Bicocca, Milano Statale and Pavia).
A.A. 2024/2025, 1st semester.
6 CFU (32 hours lectures, 24 hours exercise classes).


Communications
  • The class scheduled for Tuesday, November 12 (h. 12.30) is canceled.
  • The class scheduled for Tuesday, October 15 (h. 12.30) is canceled. We will regularly meet on Thursday, October 17 (h. 11.30).
  • Please enroll to the course on the Kiro platform to receive communications.
  • The exam consists of 2 or 3 exercises (in line with the exercises seen during the course).
    Please bring:
    โœ“ a calculator,
    โœ“ ID or student card,
    โœ“ a pen.
    Course notes, books, and other materials are not allowed (including mobile phones, smart watches, etc). You will be given the quantiles needed for the exercises (you do not need to bring the tables).

Contacts

Lectures
Laura D’Angelo: laura.dangelo@unimib.it

Exercises
Luca Danese: l.danese1@campus.unimib.it


Calendar

(may be subject to change)



Course material
Lecture notes

The course material is largely based on the book “Modello lineare - Teoria e Applicazioni con R” (Grigoletto M., Pauli F., Ventura L., 2017), and on the course material kindly provided by Prof. Nicola Sartori and Prof. Bernardo Nipoti.

1 Oct - Lecture 1
Review of probability and statistics.
Introduction to statistical models; types of regression models (number of variables, parametric/nonparametric).
Probability review (1) - Probability review (2)
Statistics review (1) - Statistics review (2)
Introduction to statistical models

3 Oct - Lecture 2
Simple linear model via ordinary least squares: definition, estimation, properties.
Simple Gaussian linear model: definition, estimation via likelihood.
Simple linear model via OLS - Properties
Gaussian simple linear model

8 Oct - Lecture 3
Exact distribution of the maximum likelihood estimators.
Inference about beta (confidence intervals, tests). Inference about the mean (prediction).
Decomposition of the sum of squares.
Distribution of the estimators and inference about beta
Inference about the mean
Sum of squares decomposition

10 Oct - Lecture 4
Coefficient of determination R^2.
Test about the goodness of fit: general formulation; proof for the case of the simple linear model.
Diagnostics: properties of the residuals.
Coefficient of determination
Test goodness of fit (1) - Test goodness of fit (2)
Residuals

17 Oct - Lecture 5
Diagnostics: plots of residuals vs fitted; eCDF and normal Q-Q plot.
Multiple linear model: model specification, assumptions, matrix notation, interpretation of the regression coefficients.
Diagnostics
Multiple linear model

24 Oct - Lecture 6
Multiple linear model: maximum likelihood estimation, exact distribution of the ML estimators.
Geometric interpretation.
Estimation
Geometric interpretation
Distribution of the estimators

31 Oct - Lecture 7
Gauss-Markov theorem.
Inference in the multiple Gaussian linear model: test about an individual coefficient (t-test); test about a subset of parameters (F-test); geometric interpretation of the comparison between nested models; notable examples of the F-test.
Gauss-Markov theorem
Test about an individual coefficient, Test about a subset of coefficients
Geometric interpretation
Notable examples

7 Nov - Lecture 8
Cuckoo exercise: test about the equality of the means of two Gaussian populations with equal variances.
One-way analysis of variance (ANOVA).
Cuckoo exercise - R code - data
ANOVA

14 Nov - Lecture 9
Two-way analysis of variance (ANOVA).
Analysis of covariance (ANCOVA).
Adjusted R^2.
Two-way ANOVA - ANCOVA - Adjusted R^2

21 Nov - Lecture 10
Introduction to generalized linear models.
Poisson regression: assumptions, interpretation of the parameters, estimation, inference (test about an individual coefficient, test for comparing nested models, test about the overall significance, goodness of fit).
Introduction to GLMs
Poisson regression (1) - Poisson regression (2)

28 Nov - Lecture 11
Introduction to models for binary data.
Logistic regression: assumptions, interpretation of the parameters, estimation, inference (test about an individual coefficient, test for comparing nested models, test about the overall significance, deviance).
Probit regression and interpretation as threshold model.
Models for binary data
Logistic regression (1) - Logistic regression (2)
Threshold models
(not part of the exam) Logistic regression with grouped data


Exercises

22 Oct - Exercise 1
Simple linear model via OLS - Exercises
Simple Gaussian linear model - Exercises part 1, Exercises part 2
Solutions

29 Oct - Exercise 2
Simple Gaussian linear model - Solutions

05 Nov - Exercise 3
Multiple Gaussian linear model - Exercises - Solutions

19 Nov - Exercise 4
Multiple Gaussian linear model - Exercises and solutions

26 Nov - Exercise 5
ANOVA - Exercises and solutions
Gaussian linear regression in R - Simple LM and residuals, Multiple LM

3-5 Dec - Exercise 6 and 7
GLM - Exercises and solutions

11 Dec - Exercise 8
GLM in R - GLM
Recap exercises - Exercises

Past Exams

Exam practice. - Exam 00
Solutions Ex. 1 - Solutions Ex. 2
25 Jan 2024 - Exam 01
22 Feb 2024 - Exam 02
27 Jun 2024 - Exam 03
23 Jul 2024 - Exam 04
03 Sep 2024 - Exam 05
24 Sep 2024 - Exam 06

Suggested book

Fox, J., 2015. Applied regression analysis and generalized linear models. Sage Publications.

Abraham and Ledolter, Introduction to Regression Modeling, Duxbury Press, 2006 –> pdf