This is much easier to work with.
---
# 8. Loss function (important connection)
In ML, we minimize loss instead of maximizing likelihood.
So we define:
[
\text{Loss} = -\ell(\beta)
]
[
= -\sum \left[
y \log(p) + (1-y)\log(1-p)
\right]
]
This is called:
> **Log Loss / Cross-Entropy Loss**
---
# 9. Intuition of log loss
Let’s test it.
---
## Case 1: correct and confident
- true (y=1)
- predicted (p=0.9)
[
\log(0.9) \approx -0.1
]
Small loss → good
---
## Case 2: wrong and confident
- true (y=1)
- predicted (p=0.1)
[
\log(0.1) \approx -2.3
]
Huge loss → heavily punished
---
# 10. Why this is powerful
This loss:
- rewards correct predictions
- punishes confident wrong predictions strongly
- is smooth and differentiable
---
# 11. How we actually find β
We cannot solve it analytically.
So we use optimization:
- Gradient Descent
- or solvers like LBFGS (used in sklearn)
---
# 12. What is happening during training?
Iteratively:
1. guess β
2. compute probabilities
3. compute loss
4. update β
5. repeat until convergence
---
# 13. Key insight (this is the real takeaway)
Logistic Regression is:
> a probabilistic model
> trained by maximizing likelihood
> equivalent to minimizing log-loss
---
# 14. Why not use MSE?
If we used:
[
(y - p)^2
]
Problems:
- not probabilistically correct
- non-convex (harder optimization)
- weaker penalties for wrong confident predictions
---
# 15. Connection to your project
In EMS prediction:
MLE ensures:
- if patient actually deteriorates
- model tries to assign high probability
So:
- better calibration
- better ranking
- meaningful probabilities
---
# 16. One-line summary
MLE chooses parameters that make the observed outcomes most probable under the model.
---