Gradient Descent is always hungry for data. How should we feed the data to the gradient descent? There are three common ways to feed in data for Gradient Descent (GD): Batch, Stochastic, and Mini-Batch. Is there a best one? Why do machine learning practitioners lean towards Mini-Batch GD? We will…