Lectures Nov 2020

Modules

Erik Bekkers

Credits: 2pts

100% of the materials for this course are available.

This module covers the topic of geometric deep learning, touching upon all its five G's (Grids, Groups, Graphs, Geodesics, and Gauges) but with a strong focus on group equivariant deep learning. The impact that CNNs made in fields such as computer vision, computational chemistry and physics, can largely be attributed to the fact that convolutions allow for weight sharing, geometric stability, and a dramatic decrease in learnable parameters by leveraging symmetries in data and architecture design. These enabling properties arise from the equivariance property of convolutions. That is, if the input image is translated, the output of a convolution is translated accordingly, which in turn means that local information does not get lost in the neural network upon an input transformation (it is just shifted to a different location). With group equivariant deep learning we can hard-code stability and weight sharing over transformations beyond just translations. E.g., it allows for sharing of weights (representing complex patterns/representations) over poses and symmetries represented by transformations such as translation + rotation + scaling. This module is split into 4 lectures:

Lecture 1 (Wednesday 6th of April on-site Q&A): Regular group convolutional neural networks (G-CNNs). In this lecture we cover the basics of group convolutional NNs and show how to leverage symmetries in data and practical problems.
Lecture 2 (Tuesday 12th of April on-site Q&A): Steerable G-CNNs. In this lecture we introduce a very general class of G-CNNs that allows to handle (rotational) symmetries in a flexible and powerful way. These methods are at the core of the most successful methods to handle 3D data such as atomic point clouds, but are also at the core of gauge equivariant methods that are applicable to arbitrary Riemannian manifolds (e.g. 2D shapes embedded in 3D space).
Lecture 3 (Tuesday 19th of April on-site Q&A): Equivariant graph NNs. Many problems in computational chemistry and computational physics are now-a-days solved via graph NNs. The SotA in these domains derive their effectives from the geometric structure and symmetries presented by the data and underlying physics. In this lecture we go over some of the state-of-the-art in geometric graph NNs, discussing several instances of the equivariant message passing paradigm.
Lecture 4 (Tuesday 26th of April on-site Q&A): Geometric latent space models. In this last lecture we take a step back from the group equivariance paradigm and look at what other types of geometric structure can be leveraged in neural networks. We will discuss some models that are based on non-Euclidean latent spaces and go over the required geometric tools to handle them.

Documents:

Lecture 1: Group equivariant DL (regular g-convs).
1.1) Introduction.
1.2) Group theory | The basics.
1.3) Regular group convolutions | Template matching viewpoint.
1.4) Equivariant NN Example | With histopathology images.
1.5) A brief history of G-CNNs.
1.6) Group Theory, Homogeneous/quotient spaces.
1.7) Group convolutions are all you need!

2.1) - Steerable kernels/basis functions.
2.2) Revisiting regular G-convs with steerable kernels.
2.3) Group Theory | Irreducible representations and Fourier transformation.
2.4) Induced representations and feature fields
2.5) Steerable group convolutions.
2.6) Activation functions for steerable G-CNNs.
2.7) Derivation of Harmonic nets from regular g-convs.
2) Complete slides

3.1) Motivation for SE(3) equivariant graph NNs.
3.2) Equivariant message passing as non-linear convolution.
3.3) Tensor products as conditional linear layers.
3.4) Group Theory (SO(3) irreps, Wigner-D, Clebsch-Gordan).
3.5) 3D Steerable (graph) convolutions.
3.6) Regular (as opposed to steerable) equivariant graph NNs.
3.7) Gauge equivariant graph NNs.

Lecture recordings:

Lecture 1: Group Equivariant Deep Learning - Lecture 1.1: Introduction.
Lecture 1: Group Equivariant Deep Learning - Lecture 1.2: Group theory (product, inverse, representations).
Lecture 1: Group Equivariant Deep Learning - Lecture 1.3: Regular group convolutional neural networks.
Lecture 1: Group Equivariant Deep Learning - Lecture 1.4: Example.
Lecture 1: Group Equivariant Deep Learning - Lecture 1.5: A Brief History of G-CNNs.
Lecture 1: Group Equivariant Deep Learning - Lecture 1.6: Group theory (Homogeneous/quotient spaces).
Lecture 1: Group Equivariant Deep Learning - Lecture 1.7: Group convolutions are all you need

.
Lecture 2.1: Steerable kernels/basis functions.
Lecture 2.2: Revisiting Regular G-Convs with Steerable Kernels.
Lecture 2.3: Group Theory (Irreducible representations, Fourier).
Lecture 2.4: Group Theory (Induced representation, feature fields).
Lecture 2.5: Steerable group convolutions.
Lecture 2.6: Activation Functions for Steerable G-CNNs.
Lecture 2.7: Derivation of Harmonic Networks from Regular G-Convs

.
3.1) Motivation for SE(3) equivariant graph NNs.
3.2) Equivariant message passing as non-linear convolution.
3.3) Tensor products as conditional linear layers.
3.4) Group Theory (SO(3) irreps, Wigner-D, Clebsch-Gordan).
3.5) Literature survey (3D Steerable graph NNs).
3.6)Literature survey (Regular equivariant graph NNs).
3.7) Gauge equivariant graph NNs.

Eric Nalisnick

Credits: 1pts

Usually we train neural networks (NNs) by way of point estimation. After running gradient descent, each weight is represented by a particular value. In this module, we consider giving NNs a Bayesian treatment. We start by specifying a probability distribution for each weight---p(W)---and the goal of training is to obtain the posterior distribution p(W | D), where D denotes the training data, via Bayes rule. The Bayesian approach provides natural mechanisms for model selection, complexity control, incorporating prior knowledge, uncertainty quantification, and online learning. Yet, Bayesian inference is computationally demanding, and much of the module will focus on scalable methods for estimating the posterior distribution.

Documents:

Slides for the module.

Lecture recordings:

Bayesian NNs: Motivation and Model Definition.
Bayesian NNs: Priors.
Bayesian NNs: Posterior Inference.

Wilker Aziz Ferreira

Credits: 1pts

Add your questions for the Q&A: https://www.dory.app/c/f97ceccf/a8a8baab_dl2---probdl1

100% of the materials for this module are published.

In this module you learn to view data as a byproduct of probabilistic experiments. You will parameterise joint probability distributions over observed random variables and perform parameter estimation by regularised gradient-based maximum likelihood estimation.

The Q&A session for this module will take place at 6th of April.

Documents:

NN parameterisation of joint distributions over observed variables

Lecture recordings:

Deep probabilistic models I - Lecture 1.1.
Deep probabilistic models I - Lecture 1.2.

Wilker Aziz Ferreira

Credits: 2pts

In this session you learn to model complex data along with unobserved discrete random variables. Examples you are probably already familiar with include mixture models, factor models, and HMMs. You will learn how to approximate intractable inference using stochastic variational methods and derive efficient gradient estimators for parameter estimation via backpropagation.

Documents:

Fundamentals of variational inference.
Variational inference for deep discrete latent variable models. Variational inference for deep discrete latent variable models.
Variational inference for deep continuous latent variable models (e.g., VAEs), reparameterised gradients beyond Gaussian (e.g., path derivatives, ADVI, implicit reparameterisation).
Slides for Deep probabilistic models II
Lecture 2: Deep Continuous Latent Variable Models.
Lecture 2: Automatic Differentiation Variational Inference.

Lecture recordings:

Lecture 1: Deep probabilistic models II - Lecture 1.1.
Lecture 2: Deep probabilistic models II - Lecture 1.2.
Lecture 3: Deep probabilistic models II - Lecture 1.3.
Lecture 4: Deep probabilistic models II - Lecture 1.4.
Lecture 5: Deep probabilistic models II - Lecture 1.5.

Wilker Aziz Ferreira

Credits: 1pts

In this session you will learn about techniques to reduce variance in gradient estimation for models with discrete unobserved random variables. We will look into: control variates, Rao-Blackwellisation, continuous relaxations and proxy gradients, sparse projections and mixed random variables.

Documents:

Variance reduction (e.g., control variates, Rao-Blackwell).
Proxy gradients (e.g., continuous relaxations, sparse parameterisation of inference models).
Slides: Proxy gradients (e.g., continuous relaxations, sparse parameterisation of inference models).

Lecture recordings:

Module 5 record.

Sara Magliacane

Credits: 1pts

100% of the materials is available!

This module will focus on the connections between causality and deep learning. We will start by introducing the basic concepts in causality (e.g. causal graphs, d-separations, interventions). Then we will focus on how can DL help causality, specifically for the task of causal discovery (learning the causal graph from data), which will be also the main topic of the tutorial. Finally we will discuss how can causality (or ideas from causality research) help DL and RL, focusing in particular on dealing with distribution shifts.

Documents:

Introduction to causality.
Graphical models.
Causal graphs.
Neural causal discovery.
Causality inspired ML.

Lecture recordings:

Causality and Deep Learning (Week 4).

Eric Marcus

Credits: 2pts

In the high performance module, we will investigate how to use scale up your deep learning performance. By the end of this course you will be able to run your projects on (super)computers with large efficiency and minimal human intervention. We will also discuss how to track the performance of your jobs and find possible bottlenecks. The topics include:

Multi-GPU training: understand how to effectively distribute models and data over any amount of GPUs
Large scale hyperparameter tuning: launch grid or automated Bayesian searches on any number of nodes with a 'flip of a switch', and follow the results live from your laptop.

Documents:

Fast data loading. (link TBA)
Lecture on multiGPU programming.
Lecture large scale hyperparameter search (cluster based).
Advanced topics in large models (that don't fit single devices). (link TBA)

Lecture recordings:

Module 8: Lecture on multi gpu.
Lecture on hyperparameter.

Efstratios Gavves

Credits: 2pts

100% of this module's materials are available.

In this module we will study generative models beyond variational autoencoders: normalizing flows, energy-based models like scored-matching neural networks and diffusion models, Hopfield networks, and so forth.

Documents:

Variational autoencoders part 1
Variational autoencoders part 2
Variational autoencoders part 3
Variational autoencoders part 4

Normalizing flows part 1.
Normalizing flows part 2.
Normalizing flows part 3.
Normalizing flows part 4.
Normalizing flows part 5.

Early energy-based models part 1.
Early energy-based models part 2.
Early energy-based models part 3.
Early energy-based models part 4

.
Score-matching & Diffusion Generative Models.
Score-matching & Diffusion Generative Models - Part 1.
Score-matching & Diffusion Generative Models - Part 2.
Score-matching & Diffusion Generative Models - Part 3.
Score-matching & Diffusion Generative Models - Part 4.

Lecture recordings:

Variational autoencoders part 1.
Variational autoencoders part 2.
Variational autoencoders part 3.
Variational autoencoders part 4.
Variational autoencoders part 5.

Normilizing flows Part 1
Normilizing flows Part 2.
Normilizing flows Part 3.
Normilizing flows Part 4.

Early energy based models part 1.
Early energy based models part 2.
Early energy based models part 3.
Early energy based models part 4.

score matching and diffusion generative models part 1.
score matching and diffusion generative models part 2.
score matching and diffusion generative models part 3.
score matching and diffusion generative models part 4.

Efstratios Gavves

Credits: 1pts

In this module we will study the interface and overlap between neural networks, dynamical systems, ordinary/partial/stochastic differential equations, and physics-based neural networks. We will study how and where dynamical systems be found in neural networks with implicit functions and neural ODEs. We will also see how neural networks can be used to model dynamical systems like Navier-Stokes with physics-informed neural networks, as well as with Fourier-inspired architectures and autoregressive neural networks.

Documents:

Dynamical systems in neural networks part 1.
Dynamical systems for neural networks part 2.
Dynamical systems for neural networks part 3.
Dynamical systems for neural networks part 4.
Dynamical systems for neural networks part 5.

Lecture recordings:

Differential equations in neural networks part 1.
Differential equations in neural networks part 2.
Differential equations in neural networks part 3.

Efstratios Gavves

Credits: 2pts

The materials for this module will be added soon!

In this module we will study how one can integrate complex structure into neural networks. We will start with a quick introduction to Monte-Carlo simulation for neural networks. We will discuss Gumbel-Softmax for differentiable models and the Gumbel-Argmax trick for sampling from categorical distributions, as well as then the Gumbel-Straight-Through variation. We will continue with subset sampling with Gumbel-Topk relaxations, sampling graphs in latent spaces in relational inference with GNN encoder/decode, as well as asampling latent permutations with Gumbel-Sinkhorn. Last, we will discuss about how discrete and continuous sampling connects to analysis of functions from mathematics.

Documents:

Deep sampling, variance reduction. (link TBA)
Gumbel, straight-through, Harmonic analysis. (link TBA)
Sampling structures. (link TBA)

Lecture recordings:

No recordings.

Andrew Yates

Credits: 1pts

Neural Information Retrieval is the application of deep learning models to ranking tasks, such as ranking documents by their relevance to a given query. After briefly introducing IR basics, this module will cover the three families of models commonly used for this task and their trade-offs.

Documents:

Pretrained Transformers for Text Ranking: BERT and Beyond

Lecture recordings:

Neural IR part 1.
Neural IR part 2.

Tutorials

Deadline:

This module covers the topic of geometric deep learning, touching upon all its five G's (Grids, Groups, Graphs, Geodesics, and Gauges) but with a strong focus on group equivariant deep learning. The impact that CNNs made in fields such as computer vision, computational chemistry and physics, can largely be attributed to the fact that convolutions allow for weight sharing, geometric stability, and a dramatic decrease in learnable parameters by leveraging symmetries in data and architecture design. These enabling properties arise from the equivariance property of convolutions. That is, if the input image is translated, the output of a convolution is translated accordingly, which in turn means that local information does not get lost in the neural network upon an input transformation (it is just shifted to a different location). With group equivariant deep learning we can hard-code stability and weight sharing over transformations beyond just translations. E.g., it allows for sharing of weights (representing complex patterns/representations) over poses and symmetries represented by transformations such as translation + rotation + scaling.

Tutorials:

Regular Group Convolutions
GDL - Steerable CNNs
Equivariant graph tutorial
Equivariant graph tutorial (without solutions)
Geometric Latent Spaces Tutorial

Lecture recordings:

Q&A session for Geometric Deep learning Tutorial 2.
Geometric Deep learning -regular gcnns tutorial.

Deadline:

Usually we train neural networks (NNs) by way of point estimation. After running gradient descent, each weight is represented by a particular value. In this module, we consider giving NNs a Bayesian treatment. We start by specifying a probability distribution for each weight---p(W)---and the goal of training is to obtain the posterior distribution p(W | D), where D denotes the training data, via Bayes rule. The Bayesian approach provides natural mechanisms for model selection, complexity control, incorporating prior knowledge, uncertainty quantification, and online learning. Yet, Bayesian inference is computationally demanding, and much of the module will focus on scalable methods for estimating the posterior distribution.

Tutorials:

Tutorial 1: Bayesian NNs with Pyro (with solutions)
Tutorial 1: Bayesian NNs with Pyro (without solutions)
Tutorial 2: Comparing to Non-Bayesian Methods for Uncertainty (with solutions)
Tutorial 2: Comparing to Non-Bayesian Methods for Uncertainty (without solutions)

Lecture recordings:

Recordings will be added soon.

Deadline:

In this module you learn to view data as a byproduct of probabilistic experiments. You will parameterise joint probability distributions over observed random variables and perform parameter estimation by regularised gradient-based maximum likelihood estimation

Tutorials:

DPM1 - Deep Probabilistic Models I

Lecture recordings:

No recordings.

Deadline:

In this session you learn to model complex data along with unobserved discrete random variables. Examples you are probably already familiar with include mixture models, factor models, and HMMs. You will learn how to approximate intractable inference using stochastic variational methods and derive efficient gradient estimators for parameter estimation via backpropagation.

Tutorials:

DPM2 - Deep Probabilistic Models II

Lecture recordings:

No recordings.

Deadline:

In this session you will learn about techniques to reduce variance in gradient estimation for models with discrete unobserved random variables. We will look into: control variates, Rao-Blackwellisation, continuous relaxations and proxy gradients, sparse projections and mixed random variables.

Tutorials:

No documents.

Lecture recordings:

No recordings.

Deadline:

This module will focus on the connections between causality and deep learning. We will start by introducing the basic concepts in causality (e.g. causal graphs, d-separations, interventions). Then we will focus on how can DL help causality, specifically for the task of causal discovery (learning the causal graph from data), which will be also the main topic of the tutorial. Finally we will discuss how can causality (or ideas from causality research) help DL and RL, focusing in particular on dealing with distribution shifts.

Tutorials:

GES Algorithm for Causal Discovery.

Lecture recordings:

No recordings.

Deadline:

In the high performance module, we will investigate how to use scale up your deep learning performance. By the end of this course you will be able to run your projects on (super)computers with large efficiency and minimal human intervention. We will also discuss how to track the performance of your jobs and find possible bottlenecks. The topics include:

Multi-GPU training: understand how to effectively distribute models and data over any amount of GPUs
Large scale hyperparameter tuning: launch grid or automated Bayesian searches on any number of nodes with a 'flip of a switch', and follow the results live from your laptop.

Tutorials:

Introduction to HyperParameter Tuning
Introduction to Multi GPU Programming

Lecture recordings:

High Performance Computing Tutorial.

Deadline:

In this module we will study generative models beyond variational autoencoders: normalizing flows, energy-based models like scored-matching neural networks and diffusion models, Hopfield networks, and so forth.

Tutorials:

Energy-based Models
Normilizing flows

Lecture recordings:

No recordings.

Deadline:

In this module we will study the interface and overlap between neural networks, dynamical systems, ordinary/partial/stochastic differential equations, and physics-based neural networks. We will study how and where dynamical systems be found in neural networks with implicit functions and neural ODEs. We will also see how neural networks can be used to model dynamical systems like Navier-Stokes with physics-informed neural networks, as well as with Fourier-inspired architectures and autoregressive neural networks.

Tutorials:

DS - Dynamical Systems & Neural ODEs.

Lecture recordings:

No recordings.

Deadline:

Neural Information Retrieval is the application of deep learning models to ranking tasks, such as ranking documents by their relevance to a given query. After briefly introducing IR basics, this module will cover the three families of models commonly used for this task and their trade-offs.

Tutorials:

Tutorial

Lecture recordings:

No recordings.

Modules

Documents:

Lecture recordings:

Documents:

Lecture recordings:

Documents:

Lecture recordings:

Documents:

Lecture recordings:

Documents:

Lecture recordings:

Documents:

Lecture recordings:

Documents:

Lecture recordings:

Documents:

Lecture recordings:

Documents:

Lecture recordings:

Documents:

Lecture recordings:

Documents:

Lecture recordings:

Tutorials

Tutorials:

Lecture recordings:

Tutorials:

Lecture recordings:

Tutorials:

Lecture recordings:

Tutorials:

Lecture recordings:

Tutorials:

Lecture recordings:

Tutorials:

Lecture recordings:

Tutorials:

Lecture recordings:

Tutorials:

Lecture recordings:

Tutorials:

Lecture recordings:

Tutorials:

Lecture recordings:

Links