2026-03-24 16:58 Tags:

1. Problem Setup

We are solving a binary classification problem:

[y \in 0, 1]

Goal:

[P (y = 1 ∣ X)]

We want a model that outputs a probability between 0 and 1.

2. Why Linear Regression Fails

Linear regression assumes:

[y = β_{0} + β_{1} x_{1} + \dots + β_{p} x_{p}]

Problems:

Output is unbounded:
$[ (-\infty, +\infty) ] $$→ invalid for probabilities$
No probabilistic interpretation
Not suitable for classification boundaries

You are already very close. The missing piece is:

why do we go from probability to odds, and then from odds to log-odds?

Let’s do only that piece.

Step 1: What are we trying to build?

We want a model that takes in:

[z = β_{0} + β_{1} x_{1} + \dots + β_{p} x_{p}]

This (z) can be anything:

very negative
zero
very positive

[z \in (- \infty, + \infty)]

But we want the final output to be a probability:

[P (y = 1 ∣ X)]

and probability must be:

[0 \leq P \leq 1]

So yes: real-number input, probability output.

Step 2: Why not just force z to equal probability?

Suppose we say:

[P = z]

Then if (z=3), probability is 3.

Impossible.

If (z=-2), probability is -2.

Also impossible.

So we need some transformation:

[z \to P]

that turns any real number into something between 0 and 1.

Step 3: What kind of transformation do we need?

We need a function that does this:

input: any real number
output: between 0 and 1

That is the actual problem.

Now here is the key:

there are many functions that can do this.

So the real question is not:

why must it be odds?

It is:

why is odds/log-odds a convenient bridge?

That is the part we now unpack.

Step 4: Start from probability

Probability is bounded:

[P \in (0, 1)]

This bounded interval is annoying for linear modeling.

Why?

Because linear models naturally live on the whole real line:

[- \infty to + \infty]

So instead of trying to make linear models live inside ((0,1)), we do the opposite:

take probability, and transform it into something unbounded.

This is the important move.

We are not yet choosing odds because of magic.

We are choosing a way to “unbound” probability.

Step 5: First transformation: probability to odds

Define:

[odds = \frac{P}{1 - P}]

Let’s see what this does.

If (P=0.5):

[odds = \frac{0.5}{0.5} = 1]

If (P=0.8):

[odds = \frac{0.8}{0.2} = 4]

If (P=0.2):

[odds = \frac{0.2}{0.8} = 0.25]

So now odds lives in:

[(0, + \infty)]

Good news:

no longer capped at 1

Bad news:

still cannot be negative
still not the full real line

So odds gets us part of the way, but not all the way.

Step 6: Second transformation: odds to log-odds

Now take log:

[lo g (\frac{P}{1 - P})]

What happens now?

if odds is very small, log is very negative
if odds = 1, log = 0
if odds is very large, log is very positive

Now the range becomes:

[(- \infty, + \infty)]

Perfect.

This is exactly the same range as the linear predictor:

[z = β_{0} + β_{1} x_{1} + \dots + β_{p} x_{p}]

So now we can say:

[lo g (\frac{P}{1 - P}) = z]

and that is legal, natural, and mathematically clean.

Step 7: So why “go to odds” first?

Because log cannot be taken directly on probability in the right way.

Let’s compare.

If you take just:

[lo g (P)]

then since (P\in(0,1)),

[lo g (P) \in (- \infty, 0)]

This only gives you negative numbers.

Not enough.

If you take:

[lo g (1 - P)]

same problem.

So we need a transformation of probability that:

respects the fact that probability has two sides: event and non-event
gives something that can go from very small to very large
then log makes it span all real numbers

That is why:

[\frac{P}{1 - P}]

is so useful.

It compares:

probability of event / probability of non-event

That is the odds.

Then log turns that positive ratio into the whole real line.

Step 8: Intuition in plain language

Probability alone says:

chance event happens

Odds says:

how much more likely the event is than the non-event

Log-odds says:

a version of that comparison that can be modeled linearly

So the path is:

probability is bounded
odds removes the upper bound
log-odds removes the lower bound too
now it matches a linear expression

Step 9: The key bridge

You were asking:

how did we suddenly turn to odds ratio?

The answer is:

we did not jump there randomly.

We needed to transform probability from a bounded scale to an unbounded scale.

The odds is an intermediate step that compares event vs non-event:

[\frac{P}{1 - P}]

Then the log of that gives a quantity on the full real line:

[lo g (\frac{P}{1 - P})]

That makes it compatible with:

[β_{0} + β_{1} x_{1} + \dots + β_{p} x_{p}]

Step 10: One sentence summary

We go to odds, then log-odds, because probability is bounded in ((0,1)), while linear models live on $((- \infty, + \infty))$ , and log-odds is the simplest transformation that connects those two worlds.

🪴LYC