What I present below is a very concise and self-contained summary of everything you needs to know to implement (or understand other's implementation of) the algorithm of training of a Multi-Layer Perceptron (MLP) through gradient descent. It also equips for in-depth understanding of generalisations of this fundamental algorithm. I call this "MLP cheatsheet". You can also listen to this lecture on YouTube: MLP cheatsheet Everything is you need to know is summarised in the following chart (open the image in a separate window for larger resolution). I'll take you through the elements of this chart bit by bit down below. MLP configuration The cheatsheet above describes a MLP network with 'd' inputs, 'm' outputs. That is, we building a network function as follows: Indices Mathematical representation of MLP requires multi-index notations. For the sake of consistency, I use 'k' for the index of...