Gradient Descent

May 27, 2018

If you are familiar with maths, you might know that minimum or maximum value of a function is got by equating its derivative to zero. In other words, the value obtained when the slope(m) is zero considered as the min or max of that line(function).

Gradient descent is one approach to find the minimum value of cost function. The algorithm goes through the θ values step by step, in direction of negative slopes and stops at slope zero.

The above image from hackernoon explains this concept well. (Please note, the parameter representation here is w not θ).

Gradient descent algorithm is given by,

θj :=θj −α∂θj ∂ J(θ0 ,θ1 )

α - Learning rate (Length of each step)

J(θ₀,θ₁) - Cost function

j - Iterator

The value of α plays a key role in determining the gradient descent. A smaller value of α will reduce the speed of computation. On the other hand, a larger value of α may skip the converging point(the situation is called as overshooting).

The screenshot from Andrew NG's course explains both issues perfectly.

In the equation,

θj:=θj−α∂θj∂J(θ0,θ1)

the derivative part or slope will reduce and reaches zero at the minimum value. i.e,

θj:=θj−α * 0 =>

𝜃 = 𝜃𝚥 => Minimum value.

Search This Blog

The Beginner's Arsenal

Gradient Descent

Comments

Post a Comment

Popular posts from this blog

Flux vs Argo CD in GitOps

Install Docker on Debian 12 (Bookworm)

Adding Multiple SSH Keys to Your Raspberry Pi

Understanding Swagger and OpenAPI Specifications

Managing Multiple SSH Keys for Different Machines