LARGE-SCALE COMPUTING

iten
Code
101799
ACADEMIC YEAR
2021/2022
CREDITS
9 credits during the 1st year of 10852 COMPUTER SCIENCE (LM-18) GENOVA

5 credits during the 2nd year of 8732 Electronic Engineering (LM-29) GENOVA

6 credits during the 2nd year of 9011 Mathematics (LM-40) GENOVA

SCIENTIFIC DISCIPLINARY SECTOR
INF/01
LANGUAGE
English
TEACHING LOCATION
GENOVA (COMPUTER SCIENCE )
semester
1° Semester
Teaching materials

OVERVIEW

Large scale Computing generally refers to the capability of hardware and software systems to dynamically adapt to an increasing load typically employing multiple, distributed nodes to complete a given processing task. Since we are in the Big Data Era, Large Scale Computing models and frameworks are becoming necessary for Data-intensive computations, a class of  computing applications which use a data parallel approach to process large volumes of data based on the Map-Reduce paradigm.

AIMS AND CONTENT

LEARNING OUTCOMES

Learning the theoretical, methodological, and technological fundamentals of advanced data processing architectures, large-scale distributed environments, and data intensive programming including Docker, HDFS, Hadoop, Spark, and Cloud/IoT platforms.

AIMS AND LEARNING OUTCOMES

The course has three specific aims: 

  1. to introduce students to the main concepts and methodologies used in Distributed Computing such as the CAP Theorem, Partitioning and Replication, Fault Tolerance and Coordination.
  2. to let students acquire knowledge and practical skills via practical assignments on functional, concurrent and distributed programming based on languages such as Python, Scala, etc.
  3. to let students test their acquired knowledge and skills in a final project to be developed in the Apache Hadoop/Spark cluster architecture using libraries for batch and streaming data processing such as dataframe, spark streaming, mllib, etc.

PREREQUISITES

Good programming skills and solid background on operating systems, databases, algorithms and data structures.

TEACHING METHODS

Frontal and online lectures, assignments, lab sessions, final project

SYLLABUS/CONTENT

  • Introduction to Distributed Systems and Cloud Computing 
  • Distributed data systems  and shared nothing architectures
  • Partitioning & Replication
  • Fault Tolerance
  • CAP Theorem
  • Hadoop & MapReduce (incl. HDFS, Hadoop Runtime)
  • Spark (Internals, RDD Programming, Dataframes, Spark Streaming)

RECOMMENDED READING/BIBLIOGRAPHY

Material and reference in the aulaweb module of the course

TEACHERS AND EXAM BOARD

Office hours: Appointment by email

Office hours: Appointment by email or by Microsoft Teams Office: Valle Puggia – 327

Office hours: Appointment by email or by Microsoft Teams Office: Valle Puggia – 301

Exam Board

GIORGIO DELZANNO (President)

GIOVANNA GUERRINI

BARBARA CATANIA (President Substitute)

FEDERICO DASSERETO (Substitute)

LESSONS

TEACHING METHODS

Frontal and online lectures, assignments, lab sessions, final project

LESSONS START

Beginning of the first semester

Class schedule

All class schedules are posted on the EasyAcademy portal.

EXAMS

EXAM DESCRIPTION

Evaluation of assignments submitted during the semester on the Github platform and Aulaweb

Evaluation of material related to final project proposal (slides, presentation, source code on Github)

Discussion of assignments and final project

Exam schedule

Date Time Location Type Notes