Hebbian Learning Rule

**Hebbian Learning Rule and STDP** *"When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place on one or both cells such that A's efficiency as one of the cells firing B, is increased"*. ![[ETH/ETH - Deep Learning in Artificial & Biological Neuronal Networks/Images - ETH Deep Learning in Artificial & Biological Neuronal Networks/image71.png]] **Spike-Timing Dependent Plasticity** (STDP) is the Hebbian learning concept of weights between neuronal synapses changing over time based on the timing of their spikes. If a postsynaptic neuron fires at the same time or just after the presynaptic neuron fires, whether this is due to an action potential in the presynaptic neuron or other nearby neurons, this indicates that connections between these two neurons could be reinforced, which occurs by increasing the weights on these synaptic junctions to more efficiently propagate future action potentials. This is known as **Long-Term Potentiation** (LTP). However, if a postsynaptic neuron fires just before a presynaptic neuron, the junction in the pre- to postsynaptic direction is possibly unnecessary or counterproductive so the weight of these synapses decrease over time. This is known as **Long-Term Depression** (LTD). We can start by writing a simple Hebbian weight update as the product of the input and output with some scaling factor: $w = \alpha \times x \times y = \alpha \times x \times \lbrack w \bullet x\rbrack$ The more closely aligned $w$ and x are, the larger $y = w^{T}x$ is, and by definition of the dot product $y = 0$ when w is orthogonal to x. This leads to the weight vector gradually pointing towards the input vector, or the cloud of input data in a dataset. We can mitigate this issue by applying a zero-mean transformation on our dataset to center the data around the origin, but this leads to a different problem of the weight vector tending to align with the direction of greatest variance. Let's explore some other ways to express this Hebbian update rule: $\nabla w = \alpha \times x \times y = \alpha(w \bullet x)x = \alpha(x \bullet x^{T}) \bullet w$ Side note: *"When the input x and output y are correlated, their product is positive, which results in a positive update to the weight (increases). When they are uncorrelated, the product is close to 0, which results in a small or no-update to the weight"*. where the last step involves a transformation of the inner product into an outer product. ![[ETH/ETH - Deep Learning in Artificial & Biological Neuronal Networks/Images - ETH Deep Learning in Artificial & Biological Neuronal Networks/image72.png]] We note that $x \bullet x^{T}$ is the correlation matrix of the vector x, which we denote as C. We now arrive at: $\nabla w = \alpha(x \bullet x^{T}) \bullet w = \alpha \times C \bullet w$ which leads to the following update over time: $\frac{\partial w}{\partial t} = C \bullet w$ The lecture slides go into some more detail describing that applying the classical solution for this expression, $w(t) = u \times e^{\lambda \times t}$ for some vector u, as $\lambda$ is positive the weight vectors will continue to increase and blow up.