UVA Deep Learning II Course


MSc in Artificial Intelligence for the University of Amsterdam.

Find Out More

About


Deep learning II is taught in the MSc program in Artificial Intelligence of the University of Amsterdam. In this course we study the theory of deep learning, namely of modern, multi-layered neural networks trained on big data. The course is coordinated by Assistant Professors Efstratios Gavves and Wilker Aziz Fereira.



And the lectures for the course are: Erik Bekkers, Eric Nalisnick, Eric Marcus, Sara Magliacane, Andrew Yates, Wilker Aziz Ferreira and Stratis Gaves.



The Teaching Assistants (TAs) are: Christos Athanasiadis, Ilze Amanda Auzina, Leonard Bereska, Gabriele Cesa, Evgenii Egorov, Bryan Eikema, Rob Hesselink, Cyril Hsu, David Knigge, Miltos Kofinas, Putri Linden, Yongtuo Liu, Thong Nguyen, Frederik Nolte, Samuel Papa, Adeel Pervez, Sharvaree Vadgama, Riccardo Valperga, Haochen Wang



Christos Athanasiadis Ilze Amanda Auzina Leonard Bereska Gabriele Cesa

Evgenii Egorov Bryan Eikema Rob Hesselink Cyril Hsu David Knigge

Miltos Kofinas Putri Linden Yongtuo Liu Thong Nguyen Frederik Nolte

Samuel Papa Adeel Pervez Sharvaree Vadgama Riccardo Valperga Haochen Wang

Modules


Erik Bekkers

Credits: 2pts

100% of the materials for this course are available.

This module covers the topic of geometric deep learning, touching upon all its five G's (Grids, Groups, Graphs, Geodesics, and Gauges) but with a strong focus on group equivariant deep learning. The impact that CNNs made in fields such as computer vision, computational chemistry and physics, can largely be attributed to the fact that convolutions allow for weight sharing, geometric stability, and a dramatic decrease in learnable parameters by leveraging symmetries in data and architecture design. These enabling properties arise from the equivariance property of convolutions. That is, if the input image is translated, the output of a convolution is translated accordingly, which in turn means that local information does not get lost in the neural network upon an input transformation (it is just shifted to a different location). With group equivariant deep learning we can hard-code stability and weight sharing over transformations beyond just translations. E.g., it allows for sharing of weights (representing complex patterns/representations) over poses and symmetries represented by transformations such as translation + rotation + scaling. This module is split into 4 lectures:

  • Lecture 1 (Wednesday 6th of April on-site Q&A): Regular group convolutional neural networks (G-CNNs). In this lecture we cover the basics of group convolutional NNs and show how to leverage symmetries in data and practical problems.
  • Lecture 2 (Tuesday 12th of April on-site Q&A): Steerable G-CNNs. In this lecture we introduce a very general class of G-CNNs that allows to handle (rotational) symmetries in a flexible and powerful way. These methods are at the core of the most successful methods to handle 3D data such as atomic point clouds, but are also at the core of gauge equivariant methods that are applicable to arbitrary Riemannian manifolds (e.g. 2D shapes embedded in 3D space).
  • Lecture 3 (Tuesday 19th of April on-site Q&A): Equivariant graph NNs. Many problems in computational chemistry and computational physics are now-a-days solved via graph NNs. The SotA in these domains derive their effectives from the geometric structure and symmetries presented by the data and underlying physics. In this lecture we go over some of the state-of-the-art in geometric graph NNs, discussing several instances of the equivariant message passing paradigm.
  • Lecture 4 (Tuesday 26th of April on-site Q&A): Geometric latent space models. In this last lecture we take a step back from the group equivariance paradigm and look at what other types of geometric structure can be leveraged in neural networks. We will discuss some models that are based on non-Euclidean latent spaces and go over the required geometric tools to handle them.

Documents:

Lecture recordings:

Eric Nalisnick

Credits: 1pts

Usually we train neural networks (NNs) by way of point estimation. After running gradient descent, each weight is represented by a particular value. In this module, we consider giving NNs a Bayesian treatment. We start by specifying a probability distribution for each weight---p(W)---and the goal of training is to obtain the posterior distribution p(W | D), where D denotes the training data, via Bayes rule. The Bayesian approach provides natural mechanisms for model selection, complexity control, incorporating prior knowledge, uncertainty quantification, and online learning. Yet, Bayesian inference is computationally demanding, and much of the module will focus on scalable methods for estimating the posterior distribution.

Documents:

Lecture recordings:

Wilker Aziz Ferreira

Credits: 1pts

Add your questions for the Q&A: https://www.dory.app/c/f97ceccf/a8a8baab_dl2---probdl1

100% of the materials for this module are published.

In this module you learn to view data as a byproduct of probabilistic experiments. You will parameterise joint probability distributions over observed random variables and perform parameter estimation by regularised gradient-based maximum likelihood estimation.

The Q&A session for this module will take place at 6th of April.

Documents:

Lecture recordings:

Wilker Aziz Ferreira

Credits: 1pts

In this session you will learn about techniques to reduce variance in gradient estimation for models with discrete unobserved random variables. We will look into: control variates, Rao-Blackwellisation, continuous relaxations and proxy gradients, sparse projections and mixed random variables.

Documents:

Lecture recordings:

Sara Magliacane

Credits: 1pts

100% of the materials is available!

This module will focus on the connections between causality and deep learning. We will start by introducing the basic concepts in causality (e.g. causal graphs, d-separations, interventions). Then we will focus on how can DL help causality, specifically for the task of causal discovery (learning the causal graph from data), which will be also the main topic of the tutorial. Finally we will discuss how can causality (or ideas from causality research) help DL and RL, focusing in particular on dealing with distribution shifts.

Documents:

Lecture recordings:

Eric Marcus

Credits: 2pts

In the high performance module, we will investigate how to use scale up your deep learning performance. By the end of this course you will be able to run your projects on (super)computers with large efficiency and minimal human intervention. We will also discuss how to track the performance of your jobs and find possible bottlenecks. The topics include:

  • Multi-GPU training: understand how to effectively distribute models and data over any amount of GPUs
  • Large scale hyperparameter tuning: launch grid or automated Bayesian searches on any number of nodes with a 'flip of a switch', and follow the results live from your laptop.

Documents:

Lecture recordings:

Efstratios Gavves

Credits: 2pts

100% of this module's materials are available.

In this module we will study generative models beyond variational autoencoders: normalizing flows, energy-based models like scored-matching neural networks and diffusion models, Hopfield networks, and so forth.

Documents:

Lecture recordings:

Efstratios Gavves

Credits: 1pts

In this module we will study the interface and overlap between neural networks, dynamical systems, ordinary/partial/stochastic differential equations, and physics-based neural networks. We will study how and where dynamical systems be found in neural networks with implicit functions and neural ODEs. We will also see how neural networks can be used to model dynamical systems like Navier-Stokes with physics-informed neural networks, as well as with Fourier-inspired architectures and autoregressive neural networks.

Documents:

Lecture recordings:

Efstratios Gavves

Credits: 2pts

The materials for this module will be added soon!

In this module we will study how one can integrate complex structure into neural networks. We will start with a quick introduction to Monte-Carlo simulation for neural networks. We will discuss Gumbel-Softmax for differentiable models and the Gumbel-Argmax trick for sampling from categorical distributions, as well as then the Gumbel-Straight-Through variation. We will continue with subset sampling with Gumbel-Topk relaxations, sampling graphs in latent spaces in relational inference with GNN encoder/decode, as well as asampling latent permutations with Gumbel-Sinkhorn. Last, we will discuss about how discrete and continuous sampling connects to analysis of functions from mathematics.

Documents:

Lecture recordings:

No recordings.

Andrew Yates

Credits: 1pts

Neural Information Retrieval is the application of deep learning models to ranking tasks, such as ranking documents by their relevance to a given query. After briefly introducing IR basics, this module will cover the three families of models commonly used for this task and their trade-offs.

Documents:

Lecture recordings:

Tutorials


Deadline:

This module covers the topic of geometric deep learning, touching upon all its five G's (Grids, Groups, Graphs, Geodesics, and Gauges) but with a strong focus on group equivariant deep learning. The impact that CNNs made in fields such as computer vision, computational chemistry and physics, can largely be attributed to the fact that convolutions allow for weight sharing, geometric stability, and a dramatic decrease in learnable parameters by leveraging symmetries in data and architecture design. These enabling properties arise from the equivariance property of convolutions. That is, if the input image is translated, the output of a convolution is translated accordingly, which in turn means that local information does not get lost in the neural network upon an input transformation (it is just shifted to a different location). With group equivariant deep learning we can hard-code stability and weight sharing over transformations beyond just translations. E.g., it allows for sharing of weights (representing complex patterns/representations) over poses and symmetries represented by transformations such as translation + rotation + scaling.

Tutorials:

Lecture recordings:


Deadline:

Usually we train neural networks (NNs) by way of point estimation. After running gradient descent, each weight is represented by a particular value. In this module, we consider giving NNs a Bayesian treatment. We start by specifying a probability distribution for each weight---p(W)---and the goal of training is to obtain the posterior distribution p(W | D), where D denotes the training data, via Bayes rule. The Bayesian approach provides natural mechanisms for model selection, complexity control, incorporating prior knowledge, uncertainty quantification, and online learning. Yet, Bayesian inference is computationally demanding, and much of the module will focus on scalable methods for estimating the posterior distribution.

Tutorials:

Lecture recordings:

Recordings will be added soon.

Deadline:

In this module you learn to view data as a byproduct of probabilistic experiments. You will parameterise joint probability distributions over observed random variables and perform parameter estimation by regularised gradient-based maximum likelihood estimation

Tutorials:

Lecture recordings:

No recordings.

Deadline:

In this session you learn to model complex data along with unobserved discrete random variables. Examples you are probably already familiar with include mixture models, factor models, and HMMs. You will learn how to approximate intractable inference using stochastic variational methods and derive efficient gradient estimators for parameter estimation via backpropagation.

Tutorials:

Lecture recordings:

No recordings.

Deadline:

In this session you will learn about techniques to reduce variance in gradient estimation for models with discrete unobserved random variables. We will look into: control variates, Rao-Blackwellisation, continuous relaxations and proxy gradients, sparse projections and mixed random variables.

Tutorials:

No documents.

Lecture recordings:

No recordings.

Deadline:

This module will focus on the connections between causality and deep learning. We will start by introducing the basic concepts in causality (e.g. causal graphs, d-separations, interventions). Then we will focus on how can DL help causality, specifically for the task of causal discovery (learning the causal graph from data), which will be also the main topic of the tutorial. Finally we will discuss how can causality (or ideas from causality research) help DL and RL, focusing in particular on dealing with distribution shifts.

Tutorials:

Lecture recordings:

No recordings.

Deadline:

In the high performance module, we will investigate how to use scale up your deep learning performance. By the end of this course you will be able to run your projects on (super)computers with large efficiency and minimal human intervention. We will also discuss how to track the performance of your jobs and find possible bottlenecks. The topics include:


Deadline:

In this module we will study generative models beyond variational autoencoders: normalizing flows, energy-based models like scored-matching neural networks and diffusion models, Hopfield networks, and so forth.

Tutorials:

Lecture recordings:

No recordings.

Deadline:

In this module we will study the interface and overlap between neural networks, dynamical systems, ordinary/partial/stochastic differential equations, and physics-based neural networks. We will study how and where dynamical systems be found in neural networks with implicit functions and neural ODEs. We will also see how neural networks can be used to model dynamical systems like Navier-Stokes with physics-informed neural networks, as well as with Fourier-inspired architectures and autoregressive neural networks.

Tutorials:

Lecture recordings:

No recordings.

Deadline:

Neural Information Retrieval is the application of deep learning models to ranking tasks, such as ranking documents by their relevance to a given query. After briefly introducing IR basics, this module will cover the three families of models commonly used for this task and their trade-offs.

Tutorials:

Lecture recordings:

No recordings.

Contact us!


If you have any questions or recommendations for the website or the course, you can always drop us a line! The knowledge should be free, so feel also free to use any of the material provided here (but please be so kind to cite us). In case you are a course instuctor and you want the solutions, please send us an email.