Welcome to one of the most important concepts in Machine Learning and AI.
Until now you've learned:
- Data Cleaning ✅
- Feature Engineering ✅
- Regression ✅
- Classification ✅
- Clustering ✅
Today you'll learn:
How to reduce data complexity while keeping most of the information.
This is exactly what PCA does.
🎯 Goal of Day-15
You will:
✅ Understand dimensionality reduction
✅ Learn PCA basics
✅ Reduce features intelligently
✅ Visualize high-dimensional data
🧠 Why PCA Exists
Imagine a dataset:
| Age | Experience | Salary | Bonus | Performance |
|---|---|---|---|---|
| 25 | 2 | 50000 | 5000 | Good |
Now imagine: we have -
100 columns
1000 columns
5000 columns
Problems:
❌ Slower training
❌ More memory usage
❌ Harder visualization
❌ More noise
🧠 Real World Example
Consider:
- Customer Age
- Years Experience
These are often related.
Instead of storing both separately, PCA can create: Experience_Score
that captures most information.
🚀 Part 1 – Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
🚀 Part 2 – Create Dataset
data = {
"Hours": [1,2,3,4,5,6,7,8],
"Marks": [40,45,50,55,70,80,90,95],
"Attendance": [60,65,70,75,80,85,90,95]
}
df = pd.DataFrame(data)
print(df.head())
🚀 Part 3 – Scale Data
PCA works best when data is scaled.
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
🚀 Part 4 – Apply PCA
Reduce 3 features → 2 features
pca = PCA(n_components=2)
reduced_data = pca.fit_transform(scaled_data)
print(reduced_data)
🧠 What Happened?
Original:
Hours
Marks
Attendance
3 dimensions
Now:
PC1
PC2
2 dimensions
🚀 Part 5 – Create PCA DataFrame
pca_df = pd.DataFrame(
reduced_data,
columns=["PC1", "PC2"]
)
print(pca_df.head())
🚀 Part 6 – Visualize PCA
plt.scatter(
pca_df["PC1"],
pca_df["PC2"]
)
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA Visualization")
plt.show()
🚀 Part 7 – Explained Variance
MOST IMPORTANT PART
print(pca.explained_variance_ratio_)
Example:
[0.95, 0.04]
Meaning:
- PC1 captures 95% information
- PC2 captures 4% information
Total: 99% information retained
🧠 Interview Question
What is PCA?
Answer:
PCA is a dimensionality reduction technique that transforms data into fewer features while preserving maximum variance (information).
🧠 Real AI Uses
PCA is used in:
- Face Recognition
- Image Compression
- Fraud Detection
- Recommendation Systems
- Data Visualization
⚠ Important Concept
PCA does NOT select columns.
It creates:
New Features
called:
Principal Components
🧠 Real AI Engineer Insight
For large datasets:
1000 features
↓
50 PCA components
Training becomes:
- Faster
- Less noisy
- More efficient
🎯 End of Day-15 Goals
You now:
✅ Understand PCA
✅ Reduce dimensions
✅ Visualize transformed data
✅ Interpret explained variance
0 Comments
If you have any queries, please let me know. Thanks.