Cost Function in Logistic Regression

Cost function in Logistic regression

We have covered a good amount of time in understanding the decision boundary. Check out previous blog Logistic Regression for Machine Learning using Python. And how to overcome this problem of the sharp curve, with probability.

In Logistic regression model the value of classier lies between 0 to 1.

logistic regression hypothesis

So to establish the hypothesis we also found the Sigmoid function or Logistic function.

sigmoid function or logistic function
sigmoid function or logistic function Fig-1

So let’s fit the parameter θ for the logistic regression.

Likelihood Function

So let say we have datasets X with m data-points. Now the logistic regression says, that the probability of the outcome can be modeled as bellow.


Based on the probability rule. If the success event probability is P than fail event would be (1-P). That’s how the Yi indicates above.

This can be combine into single form as bellow.

likelihood for data points

Which means, what is the probability of Xi occurring for given Yi value P(x|y).

The likelihood of the entire datasets X is the product of an individual data point. Which means forgiven event (coin toss) H or T. If H probability is P then T probability is (1-P).

So, Likelihood of these two event is.

logistic regression likelihood
logistic regression likelihood Fig-4

Now the principle of maximum likelihood says. we need to find the probability that maximizes the likelihood P(X|Y). Recall the odds and log-odds.

odds and log-odds
odds and log-odds Fig-5

So as we can see now. After taking a log we can end up with linear equation.

linear equation

So in order to get the parameter θ of hypothesis. We can either maximize the likelihood or minimize the cost function.

Now we can take a log from the above logistic regression likelihood equation. Which will normalize the equation into log-odds.

maximum likelihood estimation.

MLE is Maximum likelihood estimation.

Cost Function

I would recommend first check this blog on The Intuition Behind Cost Function.

In logistic regression, we create a decision boundary. And this will give us a better seance of, what logistic regression function is computing.

 logistic regression function

As we know the cost function for linear regression is residual sum of square.

linear regression cost function

We can also write as bellow. Taking the half of the observation.

linear and logistic cost function

As we can see in logistic regression the H(x) is nonlinear (Sigmoid function). And for linear regression, the cost function is convex in nature. For linear regression, it has only one global minimum. In nonlinear, there is a possibility of multiple local minima rather the one global minima.

So to overcome this problem of local minima. And to obtain global minima, we can define new cost function. We will take the same reference as we saw in Likelihood.

logistic cost function

Cross entropy loss or log loss or logistic regression cost function. cross entropy loss measure the performance of the classification model. And the output is probability value between 0 to 1

So the cost function is as bellow.

cost function

Now we can put this expression into Cost function Fig-8.

relation between cost function and likelihood function.
relation between cost function and likelihood function. Fig-9

As we can see L(θ) is a log-likelihood function in Fig-9. So we can establish a relation between Cost function and Log-Likelihood function. You can check out Maximum likelihood estimation in detail.

Maximization of L(θ) is equivalent to min of -L(θ), and using average cost over all data point, out cost function would be.

logistic regression cost function
logistic regression cost function

Choosing this cost function is a great idea for logistic regression. Because Maximum likelihood estimation is an idea in statistics to finds efficient parameter data for different models. And it has also the properties that are convex in nature.

Gradient Descent

Now we can reduce this cost function using gradient descent. The main goal of Gradient descent is to minimize the cost value. i.e. min J(θ).

gradient descent

Now to minimize our cost function we need to run the gradient descent function on each parameter i.e.

gradient descent update

Additional reading on Gradient descent

Gradient Descent for Logistic Regression Simplified – Step by Step Visual Guide


Gradient descent is an optimization algorithm used to find the values of the parameters. To solve for the gradient, we iterate through our data points using our new m and b values and compute the partial derivatives.

OK, that’s it, we are done now. If you have any questions or suggestions, please feel free to reach out to me. I’ll come up with more Machine Learning topic soon.

5 1 vote
Article Rating
Notify of
Newest Most Voted
Inline Feedbacks
View all comments

Good read…


Thanks Narendra