Logistic Regression
Logistic regression could be called a qualitative response / discrete choice model in the terminology of statistics and economics, but in relation to these notes, we can think of it for the time being as a supervised classification algorithm.
Logistic regression is often used for regression when the dependent variable is categorical, that is, yes or no, true or false, male or female, etc. The aforementioned are all binary - if we have more than two categories, it would be referred to as multinomial logistic regression.
Why does it matter? Wikipedia, once again, keeping it real:
Logistic regression is used widely in many fields, including the medical and social sciences. For example, the Trauma and Injury Severity Score (TRISS), which is widely used to predict mortality in injured patients, was originally developed by Boyd et al. using logistic regression.
Another:
Logistic regression may be used to predict whether a patient has a given disease (e.g. diabetes; coronary heart disease), based on observed characteristics of the patient (age, sex, body mass index, results of various blood tests, etc.).
Another:
The technique can also be used in engineering, especially for predicting the probability of failure of a given process, system or product. It is also used in marketing applications such as prediction of a customer's propensity to purchase a product or halt a subscription, etc.
To summarize, logistic regression is similar to linear regression. The main difference is the fact that the dependent variable is categorical.
There is a nice example here on wikipedia where we want to predict whether or not a student passes a class (category) based on how many hours they spent studying.
Details
At the center of logistic regression we find what is called the logistic function. It can take any value as an input (as in, negative infinity to positive infinity) and translate that value to something between zero and one. This allows us to think of the output as a probability rather than something absolute.
In order to make sure that this regression is bounded, we get to use what is called the logistic or the logit transformation:
- .
which is really just the "log odds ratio", since is the probability of something happening, and the is the probability of it not happening. Then we just take the log !
Here is the logistic function:
where
- is the exponential function
If we then think of as a linear combination, we have:
where
is the intercept from the linear regression equation, that is, the determining point where something switches from false to true, failure to success, etc.
is the regression coefficient multiplied by some value of the predictor
and then we can rewrite the logistic function to look like this:
So the output from is considered to be the probability of true rather than false, success rather than failure, etc. And by changing the values of and , we can get drastically different results from our logistic regression. So naturally, we ask, how do we get the best? Well, since it predicts probabilities, we can fit this function using likelihood.
Read more
C Shalizi from CMU - here
https://onlinecourses.science.psu.edu/stat414/node/191
http://www.biostathandbook.com/simplelogistic.html
Andrew Ng video on the general hypothesis here
Andrew Ng video on the decision boundary here
Andrew Ng video on the cost function & fitting of parameters here