2026-03-24 16:58 Tags:

1. Problem Setup

We are solving a binary classification problem:

Goal:

We want a model that outputs a probability between 0 and 1.


2. Why Linear Regression Fails

Linear regression assumes:

Problems:

  1. Output is unbounded:

  2. [ (-\infty, +\infty) ] $$→ invalid for probabilities
  3. No probabilistic interpretation

  4. Not suitable for classification boundaries


You are already very close. The missing piece is:

why do we go from probability to odds, and then from odds to log-odds?

Let’s do only that piece.


Step 1: What are we trying to build?

We want a model that takes in:

This (z) can be anything:

  • very negative

  • zero

  • very positive

But we want the final output to be a probability:

and probability must be:

So yes: real-number input, probability output.


Step 2: Why not just force z to equal probability?

Suppose we say:

Then if (z=3), probability is 3.

Impossible.

If (z=-2), probability is -2.

Also impossible.

So we need some transformation:

that turns any real number into something between 0 and 1.


Step 3: What kind of transformation do we need?

We need a function that does this:

  • input: any real number

  • output: between 0 and 1

That is the actual problem.

Now here is the key:

there are many functions that can do this.

So the real question is not:

why must it be odds?

It is:

why is odds/log-odds a convenient bridge?

That is the part we now unpack.


Step 4: Start from probability

Probability is bounded:

This bounded interval is annoying for linear modeling.

Why?

Because linear models naturally live on the whole real line:

So instead of trying to make linear models live inside ((0,1)), we do the opposite:

take probability, and transform it into something unbounded.

This is the important move.

We are not yet choosing odds because of magic.

We are choosing a way to “unbound” probability.


Step 5: First transformation: probability to odds

Define:

Let’s see what this does.

If (P=0.5):

If (P=0.8):

If (P=0.2):

So now odds lives in:

Good news:

  • no longer capped at 1

Bad news:

  • still cannot be negative

  • still not the full real line

So odds gets us part of the way, but not all the way.


Step 6: Second transformation: odds to log-odds

Now take log:

What happens now?

  • if odds is very small, log is very negative

  • if odds = 1, log = 0

  • if odds is very large, log is very positive

Now the range becomes:

Perfect.

This is exactly the same range as the linear predictor:

So now we can say:

and that is legal, natural, and mathematically clean.


Step 7: So why “go to odds” first?

Because log cannot be taken directly on probability in the right way.

Let’s compare.

If you take just:

then since (P\in(0,1)),

This only gives you negative numbers.

Not enough.

If you take:

same problem.

So we need a transformation of probability that:

  • respects the fact that probability has two sides: event and non-event

  • gives something that can go from very small to very large

  • then log makes it span all real numbers

That is why:

is so useful.

It compares:

probability of event / probability of non-event

That is the odds.

Then log turns that positive ratio into the whole real line.


Step 8: Intuition in plain language

Probability alone says:

  • chance event happens

Odds says:

  • how much more likely the event is than the non-event

Log-odds says:

  • a version of that comparison that can be modeled linearly

So the path is:

  1. probability is bounded

  2. odds removes the upper bound

  3. log-odds removes the lower bound too

  4. now it matches a linear expression


Step 9: The key bridge

You were asking:

how did we suddenly turn to odds ratio?

The answer is:

we did not jump there randomly.

We needed to transform probability from a bounded scale to an unbounded scale.

The odds is an intermediate step that compares event vs non-event:

Then the log of that gives a quantity on the full real line:

That makes it compatible with:


Step 10: One sentence summary

We go to odds, then log-odds, because probability is bounded in ((0,1)), while linear models live on , and log-odds is the simplest transformation that connects those two worlds.