Today we will learn one of the MOST important concepts in Machine Learning.
Without this:
❌ Models look good
❌ But fail in real world
🎯 Goal of Day-10
You will:
✅ Understand train/test split
✅ Learn overfitting vs underfitting
✅ Build more realistic ML workflow
🧠 Problem With Previous Days
Until now:
model.fit(X, y)
model.predict(X)
👉 We trained AND tested on SAME data ❌
That’s cheating.
🧠 Real-Life Example
Imagine:
- Student sees exam questions before exam
- Scores 100%
Does that mean student is smart? ❌
Same in ML.
🚀 Solution → Train/Test Split
Split data:
- Train set → model learns
- Test set → model is evaluated
Common split:
80% Train
20% Test
🚀 Part 1 – Import Library
from sklearn.model_selection import train_test_split
🚀 Part 2 – Create Dataset
import pandas as pd
data = {
"Hours": [1,2,3,4,5,6,7,8],
"Pass": [0,0,0,0,1,1,1,1]
}
df = pd.DataFrame(data)
X = df[["Hours"]]
y = df["Pass"]
🚀 Part 3 – Split Data
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42
)
🧠 Meaning
| Parameter | Meaning |
|---|---|
| test_size=0.2 | 20% test data |
| random_state=42 | Same split every run |
model = LogisticRegression()
model.fit(X_train, y_train)
print(y_pred)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
🧠 Overfitting Explained
❌ Overfitting
Model memorizes training data.
Training accuracy:
100%
But test accuracy:
50%
👉 Bad real-world performance
🧠 Underfitting
Model too simple.
- Bad training accuracy
- Bad testing accuracy
🧠 Ideal Model
Good:
- Training accuracy
- Testing accuracy
AND close to each other.
📈 Visualization Concept
Think:
This is usually healthier than:
- 100% train
- 50% test
🚀 Part 7 – Compare Train vs Test Accuracy
train_acc = model.score(X_train, y_train)
test_acc = model.score(X_test, y_test)
print("Train Accuracy:", train_acc)
print("Test Accuracy:", test_acc)
⚠ Important Interview Question
Q:
Why do we split train and test data?
Answer:
To evaluate model performance on unseen data.
🧠 Real AI Insight
In real projects:
- Data leakage = huge issue
- Overfitting = common problem
- Evaluation matters more than training
🎯 End of Day-10 Goals
You now:
✅ Understand train/test split
✅ Understand overfitting
✅ Test model properly
Github Link: https://github.com/dotnetfullstackdeveloper/ai-engineer-journey/blob/main/Week-02-Machine-Learning/Day-10
0 Comments
If you have any queries, please let me know. Thanks.