2026-03-18 14:54 Tags:

Introduction to Cross Validation

Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Data Example

df = pd.read_csv("../DATA/Advertising.csv")
df.head()

Train | Test Split Procedure

Workflow

Clean and adjust data as necessary for X and y
Split data into Train/Test for both X and y
Fit/Train scaler on training X data
Scale X test data
Create model
Fit/Train model on X_train
Evaluate model on X_test by creating predictions and comparing to y_test
Adjust parameters as necessary and repeat steps 5 and 6

Create `X` and `y`

X = df.drop('sales', axis=1)
y = df['sales']

Train Test Split

from sklearn.model_selection import train_test_split
 
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=101
)

Scale Data

from sklearn.preprocessing import StandardScaler
 
scaler = StandardScaler()
scaler.fit(X_train)
 
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Note:

The scaler is fit only on X_train
Then the same fitted scaler is used to transform both X_train and X_test

Create Model

from sklearn.linear_model import Ridge
 
model = Ridge(alpha=100)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Poor alpha choice on purpose:

model = Ridge(alpha=100)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Evaluation

from sklearn.metrics import mean_squared_error
 
mean_squared_error(y_test, y_pred)

Adjust Parameters and Re-evaluate

model = Ridge(alpha=1)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Another Evaluation

mean_squared_error(y_test, y_pred)

Observation:

alpha=1 performs much better than alpha=100 in this example
This process can be repeated until satisfied with performance metrics

Note:

RidgeCV can automate this for Ridge regression
The purpose here is to understand the general cross-validation process for any model

Train | Validation | Test Split Procedure

This is also called a hold-out set approach.

Key idea:

Do not adjust parameters based on the final test set
Use the final test set only for reporting final expected performance

Workflow

Clean and adjust data as necessary for X and y
Split data into Train/Validation/Test for both X and y
Fit/Train scaler on training X data
Scale evaluation data
Create model
Fit/Train model on X_train
Evaluate model on evaluation data by creating predictions and comparing to y_eval
Adjust parameters as necessary and repeat steps 5 and 6
Get final metrics on test set
- not allowed to go back and adjust after this

Create `X` and `y`

X = df.drop('sales', axis=1)
y = df['sales']

Split Twice: Train | Validation | Test

from sklearn.model_selection import train_test_split
 
# 70% of data is training data, set aside other 30%
X_train, X_OTHER, y_train, y_OTHER = train_test_split(
    X, y, test_size=0.3, random_state=101
)
 
# Remaining 30% is split into evaluation and test sets
# Each is 15% of the original data size
X_eval, X_test, y_eval, y_test = train_test_split(
    X_OTHER, y_OTHER, test_size=0.5, random_state=101
)

Scale Data

from sklearn.preprocessing import StandardScaler
 
scaler = StandardScaler()
scaler.fit(X_train)
 
X_train = scaler.transform(X_train)
X_eval = scaler.transform(X_eval)
X_test = scaler.transform(X_test)

Create Model

from sklearn.linear_model import Ridge
 
# Poor Alpha Choice on purpose!
model = Ridge(alpha=100)
model.fit(X_train, y_train)
y_eval_pred = model.predict(X_eval)

Evaluation

from sklearn.metrics import mean_squared_error
 
mean_squared_error(y_eval, y_eval_pred)

Adjust Parameters and Re-evaluate

model = Ridge(alpha=1)
model.fit(X_train, y_train)
y_eval_pred = model.predict(X_eval)

Another Evaluation

mean_squared_error(y_eval, y_eval_pred)

Final Evaluation

After this step, parameters should no longer be changed.

y_final_test_pred = model.predict(X_test)
mean_squared_error(y_test, y_final_test_pred)

Cross Validation with `cross_val_score`

X = df.drop('sales', axis=1)
y = df['sales']

Train Test Split

from sklearn.model_selection import train_test_split
 
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=101
)

Scale Data

from sklearn.preprocessing import StandardScaler
 
scaler = StandardScaler()
scaler.fit(X_train)
 
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Create Model

from sklearn.linear_model import Ridge
 
model = Ridge(alpha=100)

Run Cross Validation

from sklearn.model_selection import cross_val_score
 
# SCORING OPTIONS:
# https://scikit-learn.org/stable/modules/model_evaluation.html
 
scores = cross_val_score(
    model,
    X_train,
    y_train,
    scoring='neg_mean_squared_error',
    cv=5
)
scores

Note:

cv=5 means 5-fold cross validation
For error metrics like MSE, scikit-learn returns the negative version, so lower error corresponds to a larger negative score
To interpret MSE more naturally, take the absolute value of the mean

Average CV Score

abs(scores.mean())

Adjust Model Based on Metrics

model = Ridge(alpha=1)
 
scores = cross_val_score(
    model,
    X_train,
    y_train,
    scoring='neg_mean_squared_error',
    cv=5
)

Mean CV Error

abs(scores.mean())

Final Evaluation

# Need to fit the model first!
model.fit(X_train, y_train)
 
y_final_test_pred = model.predict(X_test)
mean_squared_error(y_test, y_final_test_pred)

Cross Validation with `cross_validate`

Difference from `cross_val_score`

cross_validate differs from cross_val_score in two ways:

It allows specifying multiple metrics for evaluation
It returns a dictionary containing:
- fit times
- score times
- test scores
- optionally training scores and fitted estimators

Return Values

For single metric evaluation

If scoring is a string, callable, or None, the keys will be:

['test_score', 'fit_time', 'score_time']

For multiple metric evaluation

The returned dictionary contains keys like:

['test_<scorer1_name>', 'test_<scorer2_name>', 'test_<scorer...>', 'fit_time', 'score_time']

Training Scores

return_train_score=False by default
This saves computation time
To evaluate training scores too, set return_train_score=True

Create `X` and `y`

X = df.drop('sales', axis=1)
y = df['sales']

Train Test Split

from sklearn.model_selection import train_test_split
 
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=101
)

Scale Data

from sklearn.preprocessing import StandardScaler
 
scaler = StandardScaler()
scaler.fit(X_train)
 
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Create Model

from sklearn.linear_model import Ridge
 
model = Ridge(alpha=100)

Run `cross_validate`

from sklearn.model_selection import cross_validate
 
# SCORING OPTIONS:
# https://scikit-learn.org/stable/modules/model_evaluation.html
 
scores = cross_validate(
    model,
    X_train,
    y_train,
    scoring=['neg_mean_absolute_error', 'neg_mean_squared_error', 'max_error'],
    cv=5
)
scores

View Results

pd.DataFrame(scores)

pd.DataFrame(scores).mean()

Adjust Model Based on Metrics

model = Ridge(alpha=1)
 
scores = cross_validate(
    model,
    X_train,
    y_train,
    scoring=['neg_mean_absolute_error', 'neg_mean_squared_error', 'max_error'],
    cv=5
)
 
pd.DataFrame(scores).mean()

Final Evaluation

# Need to fit the model first!
model.fit(X_train, y_train)
 
y_final_test_pred = model.predict(X_test)
mean_squared_error(y_test, y_final_test_pred)

Summary

Train/Test Split

Simple
Fast
Good starting point
But model tuning may depend too much on one split

Train/Validation/Test Split

Separates tuning from final testing
Test set stays untouched until the end
More reliable than only train/test

`cross_val_score`

Performs cross-validation directly
Good for one evaluation metric
Returns an array of scores

`cross_validate`

More flexible than cross_val_score
Supports multiple metrics
Also returns fit time and score time

`cross_val_score` vs `cross_validate`

1. Core difference

Function	Purpose
`cross_val_score`	Simple CV → returns only scores
`cross_validate`	Advanced CV → returns detailed results

2. `cross_val_score`

What it does

scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error')

Output

array([-10.2, -9.8, -11.0, -10.5, -9.9])

👉 Only gives:

test scores for each fold

When to use

You only care about one metric
You want something quick and simple

3. `cross_validate`

What it does

scores = cross_validate(
    model,
    X,
    y,
    cv=5,
    scoring=['neg_mean_squared_error', 'neg_mean_absolute_error']
)

Output

{
  'test_neg_mean_squared_error': [...],
  'test_neg_mean_absolute_error': [...],
  'fit_time': [...],
  'score_time': [...]
}

👉 Returns a dictionary

What extra info you get

Multiple metrics
Fit time
Score time
(optional) training scores

4. Side-by-side comparison

Feature	`cross_val_score`	`cross_validate`
Multiple metrics	❌	✅
Fit time	❌	✅
Score time	❌	✅
Train score	❌	✅ (optional)
Output type	array	dict

5. Subtle but important

`cross_val_score`

scores.mean()

👉 directly usable

`cross_validate`

pd.DataFrame(scores).mean()

👉 need to extract from dict

6. When YOU should use which

Given your project (ML + model comparison):

Use `cross_val_score` when:

quick check of model performance
tuning one metric (e.g. AUC)

Use `cross_validate` when:

comparing multiple metrics
analyzing model behavior
debugging performance

7. One line intuition

cross_val_score = just give me the score
cross_validate = give me everything about training + evaluation

🪴LYC

Cross Validation - Train Validation Test

Introduction to Cross Validation

Imports

Data Example

Train | Test Split Procedure

Workflow

Create X and y

Train Test Split

Scale Data

Create Model

Evaluation

Adjust Parameters and Re-evaluate

Another Evaluation

Train | Validation | Test Split Procedure

Workflow

Create X and y

Split Twice: Train | Validation | Test

Scale Data

Create Model

Evaluation

Adjust Parameters and Re-evaluate

Another Evaluation

Final Evaluation

Cross Validation with cross_val_score

Train Test Split

Scale Data

Create Model

Run Cross Validation

Average CV Score

Adjust Model Based on Metrics

Mean CV Error

Final Evaluation

Cross Validation with cross_validate

Difference from cross_val_score

Return Values

For single metric evaluation

For multiple metric evaluation

Training Scores

Create X and y

Train Test Split

Scale Data

Create Model

Run cross_validate

View Results

Adjust Model Based on Metrics

Final Evaluation

Summary

Train/Test Split

Train/Validation/Test Split

cross_val_score

cross_validate

cross_val_score vs cross_validate

1. Core difference

2. cross_val_score

What it does

Output

When to use

3. cross_validate

What it does

Output

What extra info you get

4. Side-by-side comparison

5. Subtle but important

cross_val_score

cross_validate

6. When YOU should use which

Use cross_val_score when:

Use cross_validate when:

7. One line intuition

Graph View

Table of Contents

Backlinks

Create `X` and `y`

Create `X` and `y`

Cross Validation with `cross_val_score`

Cross Validation with `cross_validate`

Difference from `cross_val_score`

Create `X` and `y`

Run `cross_validate`

`cross_val_score`

`cross_validate`

`cross_val_score` vs `cross_validate`

2. `cross_val_score`

3. `cross_validate`

`cross_val_score`

`cross_validate`

Use `cross_val_score` when:

Use `cross_validate` when: