Skip to content

What is Cost Function in Linear regression?

Loss function

In the last article we saw Linear regression in detail, the goal is to sales prediction and automobile consulting company case study. you can follow this my previous article on Linear Regression using python with an automobile company case study.

Welcome to the module on Cost function.

Consider the situation, you are trying to solve the classification problem, i.e. classify data into categories. Suppose the data is pertaining to the weight and height of two different categories of fishes denoted by red and blue points in the scatter plot below.

In fact, all three classifications have high accuracy, but the 3rd solution has the best solution. Because it classifies all the points perfectly is because the line is almost exactly in between the two groups.

This is where the Cost function concepts come in. Cost function algorithm leverage to reach to an optimal solution. The agenda of the concept to understand how to minimize and maximize the cost function using an algorithm.

Cost Function

Let say we want to predict the salary of a person based on his experience, bellow table is just a made up data.

sample data set

Now let’s make a scatter plot of these data point and now we need to fit a straight line that is the best fit line. Now in the bellow diagram if you take (6,6), now consider the straight line given that.

Y=mx + c at this time on Xi we have a value Yi which is coming from data set and the predicated value Ypred = mXi + C now we would like to define a cost function which is based on the difference between Yi and Ypred which (Yi-Ypred)² (remember the residual and RSS.)

scatter plot

And this is what we would like to minimize, which is sum of all the point which are in the data set, we would like to take this square error term and sum it over all the data-point and minimize the sum which is.

cost function

This gives us cost function which we would like to minimize, so just to give you a perspective using this equation we want to find ‘m’ and ‘C’ such that the sum of above expression is minimum because that would give us the best line fit. (A best straight line where the error is minimum).

Now the question is how to minimize this, very simple recall you high school Math (Diffraction).

For example, With one variable.

cost function with one variable

With two variable.

cost function with two variable

So basically, what we have done, we found out the will minimize the given cost function. Now if we talk about our equation.

And calculate the cost function with respect to (w.r.t) m and C we will get two linear equation check the bellow calculation.

cost function (w.r.t) m and c

And now check this bellow implementation if we put our data-point and calculate.

calculate m and c

So, we are managed to solve m and c and find out which straight line that fits our data-point. Now, if we put the value of m and c in the bellow equation, we will get the regression line.

Fitting a straight line, the cost function was the sum of squared errors, but it will vary from algorithm to algorithm. To minimize the sum of squared errors and find the optimal m and c, we differentiated the sum of squared errors w.r.t the parameters m and c. We then solved the linear equations to obtain the values m and c. In most cases, you will have to minimize the cost function.

Differentiate the function w.r.t the parameter and equate to 0.

  1. For minimization — the function value of the double differential should be greater than 0.
  2. For maximization — the function value of the double differential should be less than 0.

Type of minimization or maximization

  1. Constrained
  2. Unconstrained

I will not go to detail of constrained minimization and maximization since it’s not been used much in machine learning except SVM (support vector machine), for more detail about constrained optimization you can follow this link

But I will give you some intuition about constrained and unconstrained optimization problem.

So, you go out with your friends after long time, but everyone has budget constraints of 1000 Rs. you basically want to have maximum fun but you have a budget constraint so you want to maximize something based on constraint this would be a constraint maximization problem.

similarly for unconstrained problem you just want to minimize and maximize output but there are no constraint involved the problem of minimizing sum of square error (RSS) which we have been discussing, does not have any constraint apply on X and Y which we are trying to estimate therefore this is the problem the unconstrained minimization problem.

constrain minimization problem has some condition and restrictions to impose on the range of parameters that is the values of parameter can take.

let’s get an intuition about the constrained and unconstrained problems. For example on given function (see the bellow image), is a constraint which means x can take value more than or equal to B then we can see the minimum value of the cost function can take at x=b which means X can’t take value A=0, because of this constraints the minimum value of cost function will take at B.

So, the cost function for given equation would be 4(Four). So, the minimum value we can reach with this constrained are 4(Four), where unconstrained way it would be (0) zero.

In this way we have two possible solution depending whether constrained and unconstrained.

We saw the example of optimization using differentiation, there are two ways to go about unconstrained optimization.

  1. Differentiation
  2. Using Gradient descent

Gradient descent we will see in next blog, this time pretty much that’s it about the Cost function.


If you recall the equation for the line that’s fit the data in Linear Regression, is given as:

Where β0 is the intercept of the fitted line and β1 is the coefficient for the independent variable x. As discuss above similarly we can calculate the value of β0 and β1 through differentiation.


OK, that’s it, we are done now. If you have any questions or suggestions, please feel free to reach out to me. I’ll come up with more Machine Learning topic soon.

4.7 3 votes
Article Rating
Notify of
Newest Most Voted
Inline Feedbacks
View all comments
Manish Kumar

Nice article.


Thanks, Manish for appreciation.


[…] Fitting a straight line, the cost function was the sum of squared errors, but it will vary from algo… […]