MSc in Artificial Intelligence for the University of Amsterdam.
Find Out MoreDeep learning II is taught in the MSc program in Artificial Intelligence of the University of Amsterdam. In this course we study the theory of deep learning, namely of modern, multilayered neural networks trained on big data. The course is coordinated by Assistant Professors Efstratios Gavves and Wilker Aziz Fereira.
And the lectures for the course are: Erik Bekkers, Eric Nalisnick, Eric Marcus, Sara Magliacane, Andrew Yates, Wilker Aziz Ferreira and Stratis Gaves.
The Teaching Assistants (TAs) are:
Christos Athanasiadis, Ilze Amanda Auzina, Leonard Bereska, Gabriele Cesa, Evgenii Egorov, Bryan Eikema, Rob Hesselink, Cyril Hsu, David Knigge, Miltos Kofinas, Putri Linden, Yongtuo Liu, Thong Nguyen, Frederik Nolte, Samuel Papa, Adeel Pervez, Sharvaree Vadgama, Riccardo Valperga, Haochen Wang
Erik Bekkers
Credits: 2pts
100% of the materials for this course are available.
This module covers the topic of geometric deep learning, touching upon all its five G's (Grids, Groups, Graphs, Geodesics, and Gauges) but with a strong focus on group equivariant deep learning. The impact that CNNs made in fields such as computer vision, computational chemistry and physics, can largely be attributed to the fact that convolutions allow for weight sharing, geometric stability, and a dramatic decrease in learnable parameters by leveraging symmetries in data and architecture design. These enabling properties arise from the equivariance property of convolutions. That is, if the input image is translated, the output of a convolution is translated accordingly, which in turn means that local information does not get lost in the neural network upon an input transformation (it is just shifted to a different location). With group equivariant deep learning we can hardcode stability and weight sharing over transformations beyond just translations. E.g., it allows for sharing of weights (representing complex patterns/representations) over poses and symmetries represented by transformations such as translation + rotation + scaling. This module is split into 4 lectures:
Eric Nalisnick
Credits: 1pts
Usually we train neural networks (NNs) by way of point estimation. After running gradient descent, each weight is represented by a particular value. In this module, we consider giving NNs a Bayesian treatment. We start by specifying a probability distribution for each weightp(W)and the goal of training is to obtain the posterior distribution p(W  D), where D denotes the training data, via Bayes rule. The Bayesian approach provides natural mechanisms for model selection, complexity control, incorporating prior knowledge, uncertainty quantification, and online learning. Yet, Bayesian inference is computationally demanding, and much of the module will focus on scalable methods for estimating the posterior distribution.
Wilker Aziz Ferreira
Credits: 1pts
Add your questions for the Q&A: https://www.dory.app/c/f97ceccf/a8a8baab_dl2probdl1
100% of the materials for this module are published.
In this module you learn to view data as a byproduct of probabilistic experiments. You will parameterise joint probability distributions over observed random variables and perform parameter estimation by regularised gradientbased maximum likelihood estimation.
The Q&A session for this module will take place at 6th of April.
Wilker Aziz Ferreira
Credits: 2pts
In this session you learn to model complex data along with unobserved discrete random variables. Examples you are probably already familiar with include mixture models, factor models, and HMMs. You will learn how to approximate intractable inference using stochastic variational methods and derive efficient gradient estimators for parameter estimation via backpropagation.
Wilker Aziz Ferreira
Credits: 1pts
In this session you will learn about techniques to reduce variance in gradient estimation for models with discrete unobserved random variables. We will look into: control variates, RaoBlackwellisation, continuous relaxations and proxy gradients, sparse projections and mixed random variables.
Sara Magliacane
Credits: 1pts
100% of the materials is available!
This module will focus on the connections between causality and deep learning. We will start by introducing the basic concepts in causality (e.g. causal graphs, dseparations, interventions). Then we will focus on how can DL help causality, specifically for the task of causal discovery (learning the causal graph from data), which will be also the main topic of the tutorial. Finally we will discuss how can causality (or ideas from causality research) help DL and RL, focusing in particular on dealing with distribution shifts.
Eric Marcus
Credits: 2pts
In the high performance module, we will investigate how to use scale up your deep learning performance. By the end of this course you will be able to run your projects on (super)computers with large efficiency and minimal human intervention. We will also discuss how to track the performance of your jobs and find possible bottlenecks. The topics include:
Efstratios Gavves
Credits: 2pts
100% of this module's materials are available.
In this module we will study generative models beyond variational autoencoders: normalizing flows, energybased models like scoredmatching neural networks and diffusion models, Hopfield networks, and so forth.
Efstratios Gavves
Credits: 1pts
In this module we will study the interface and overlap between neural networks, dynamical systems, ordinary/partial/stochastic differential equations, and physicsbased neural networks. We will study how and where dynamical systems be found in neural networks with implicit functions and neural ODEs. We will also see how neural networks can be used to model dynamical systems like NavierStokes with physicsinformed neural networks, as well as with Fourierinspired architectures and autoregressive neural networks.
Efstratios Gavves
Credits: 2pts
The materials for this module will be added soon!
In this module we will study how one can integrate complex structure into neural networks. We will start with a quick introduction to MonteCarlo simulation for neural networks. We will discuss GumbelSoftmax for differentiable models and the GumbelArgmax trick for sampling from categorical distributions, as well as then the GumbelStraightThrough variation. We will continue with subset sampling with GumbelTopk relaxations, sampling graphs in latent spaces in relational inference with GNN encoder/decode, as well as asampling latent permutations with GumbelSinkhorn. Last, we will discuss about how discrete and continuous sampling connects to analysis of functions from mathematics.
Andrew Yates
Credits: 1pts
Neural Information Retrieval is the application of deep learning models to ranking tasks, such as ranking documents by their relevance to a given query. After briefly introducing IR basics, this module will cover the three families of models commonly used for this task and their tradeoffs.
Deadline: This module covers the topic of geometric deep learning, touching upon all its five G's (Grids, Groups, Graphs, Geodesics, and Gauges) but with a strong focus on group equivariant deep learning. The impact that CNNs made in fields such as computer vision, computational chemistry and physics, can largely be attributed to the fact that convolutions allow for weight sharing, geometric stability, and a dramatic decrease in learnable parameters by leveraging symmetries in data and architecture design. These enabling properties arise from the equivariance property of convolutions. That is, if the input image is translated, the output of a convolution is translated accordingly, which in turn means that local information does not get lost in the neural network upon an input transformation (it is just shifted to a different location). With group equivariant deep learning we can hardcode stability and weight sharing over transformations beyond just translations. E.g., it allows for sharing of weights (representing complex patterns/representations) over poses and symmetries represented by transformations such as translation + rotation + scaling. Tutorials:
Regular Group Convolutions
GDL  Steerable CNNs Equivariant graph tutorial Equivariant graph tutorial (without solutions) Geometric Latent Spaces Tutorial Lecture recordings: 

Deadline: Usually we train neural networks (NNs) by way of point estimation. After running gradient descent, each weight is represented by a particular value. In this module, we consider giving NNs a Bayesian treatment. We start by specifying a probability distribution for each weightp(W)and the goal of training is to obtain the posterior distribution p(W  D), where D denotes the training data, via Bayes rule. The Bayesian approach provides natural mechanisms for model selection, complexity control, incorporating prior knowledge, uncertainty quantification, and online learning. Yet, Bayesian inference is computationally demanding, and much of the module will focus on scalable methods for estimating the posterior distribution. Tutorials:
Tutorial 1: Bayesian NNs with Pyro (with solutions)
Tutorial 1: Bayesian NNs with Pyro (without solutions) Tutorial 2: Comparing to NonBayesian Methods for Uncertainty (with solutions) Tutorial 2: Comparing to NonBayesian Methods for Uncertainty (without solutions) Lecture recordings:
Recordings will be added soon.


Deadline: In this module you learn to view data as a byproduct of probabilistic experiments. You will parameterise joint probability distributions over observed random variables and perform parameter estimation by regularised gradientbased maximum likelihood estimation Tutorials:Lecture recordings:
No recordings.


Deadline: In this session you learn to model complex data along with unobserved discrete random variables. Examples you are probably already familiar with include mixture models, factor models, and HMMs. You will learn how to approximate intractable inference using stochastic variational methods and derive efficient gradient estimators for parameter estimation via backpropagation. Tutorials:Lecture recordings:
No recordings.


Deadline: In this session you will learn about techniques to reduce variance in gradient estimation for models with discrete unobserved random variables. We will look into: control variates, RaoBlackwellisation, continuous relaxations and proxy gradients, sparse projections and mixed random variables. Tutorials:
No documents.
Lecture recordings:
No recordings.


Deadline: This module will focus on the connections between causality and deep learning. We will start by introducing the basic concepts in causality (e.g. causal graphs, dseparations, interventions). Then we will focus on how can DL help causality, specifically for the task of causal discovery (learning the causal graph from data), which will be also the main topic of the tutorial. Finally we will discuss how can causality (or ideas from causality research) help DL and RL, focusing in particular on dealing with distribution shifts. Tutorials:Lecture recordings:
No recordings.


Deadline: In the high performance module, we will investigate how to use scale up your deep learning performance. By the end of this course you will be able to run your projects on (super)computers with large efficiency and minimal human intervention. We will also discuss how to track the performance of your jobs and find possible bottlenecks. The topics include:


Deadline: In this module we will study generative models beyond variational autoencoders: normalizing flows, energybased models like scoredmatching neural networks and diffusion models, Hopfield networks, and so forth. Tutorials:Lecture recordings:
No recordings.


Deadline: In this module we will study the interface and overlap between neural networks, dynamical systems, ordinary/partial/stochastic differential equations, and physicsbased neural networks. We will study how and where dynamical systems be found in neural networks with implicit functions and neural ODEs. We will also see how neural networks can be used to model dynamical systems like NavierStokes with physicsinformed neural networks, as well as with Fourierinspired architectures and autoregressive neural networks. Tutorials:Lecture recordings:
No recordings.


Deadline: Neural Information Retrieval is the application of deep learning models to ranking tasks, such as ranking documents by their relevance to a given query. After briefly introducing IR basics, this module will cover the three families of models commonly used for this task and their tradeoffs. Tutorials:Lecture recordings:
No recordings.


If you have any questions or recommendations for the website or the course, you can always drop us a line! The knowledge should be free, so feel also free to use any of the material provided here (but please be so kind to cite us). In case you are a course instuctor and you want the solutions, please send us an email.