Header Ads Widget

Responsive Advertisement

Ticker

6/recent/ticker-posts

Day-15 – PCA (Principal Component Analysis) + Dimensionality Reduction

Welcome to one of the most important concepts in Machine Learning and AI.

Until now you've learned:

  • Data Cleaning ✅
  • Feature Engineering ✅
  • Regression ✅
  • Classification ✅
  • Clustering ✅

Today you'll learn:

How to reduce data complexity while keeping most of the information.

This is exactly what PCA does.


🎯 Goal of Day-15

You will:

✅ Understand dimensionality reduction
✅ Learn PCA basics
✅ Reduce features intelligently
✅ Visualize high-dimensional data


🧠 Why PCA Exists

Imagine a dataset:

Age Experience  Salary  Bonus  Performance
25 2  50000  5000  Good

Now imagine: we have -

100 columns
1000 columns
5000 columns

Problems:

❌ Slower training
❌ More memory usage
❌ Harder visualization
❌ More noise


🧠 Real World Example

Consider:

  • Customer Age
  • Years Experience

These are often related.

Instead of storing both separately, PCA can create: Experience_Score

that captures most information.


🚀 Part 1 – Import Libraries

import pandas as pd
import matplotlib.pyplot as plt

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler


🚀 Part 2 – Create Dataset

data = {
"Hours": [1,2,3,4,5,6,7,8],
"Marks": [40,45,50,55,70,80,90,95],
"Attendance": [60,65,70,75,80,85,90,95]
}

df = pd.DataFrame(data)

print(df.head())


🚀 Part 3 – Scale Data

PCA works best when data is scaled.

scaler = StandardScaler()

scaled_data = scaler.fit_transform(df)


🚀 Part 4 – Apply PCA

Reduce 3 features → 2 features

pca = PCA(n_components=2)

reduced_data = pca.fit_transform(scaled_data)

print(reduced_data)


🧠 What Happened?

Original:

Hours
Marks
Attendance

3 dimensions

Now:

PC1
PC2

2 dimensions


🚀 Part 5 – Create PCA DataFrame

pca_df = pd.DataFrame(
reduced_data,
columns=["PC1", "PC2"]
)

print(pca_df.head())


🚀 Part 6 – Visualize PCA

plt.scatter(
pca_df["PC1"],
pca_df["PC2"]
)

plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA Visualization")

plt.show()


🚀 Part 7 – Explained Variance

MOST IMPORTANT PART

print(pca.explained_variance_ratio_)

Example:

[0.95, 0.04]


Meaning:

  • PC1 captures 95% information
  • PC2 captures 4% information

Total: 99% information retained


🧠 Interview Question

What is PCA?

Answer:

PCA is a dimensionality reduction technique that transforms data into fewer features while preserving maximum variance (information).


🧠 Real AI Uses

PCA is used in:

  • Face Recognition
  • Image Compression
  • Fraud Detection
  • Recommendation Systems
  • Data Visualization


⚠ Important Concept

PCA does NOT select columns.

It creates:

New Features

called:

Principal Components


🧠 Real AI Engineer Insight

For large datasets:

1000 features

50 PCA components

Training becomes:

  • Faster
  • Less noisy
  • More efficient


🎯 End of Day-15 Goals

You now:

✅ Understand PCA
✅ Reduce dimensions
✅ Visualize transformed data
✅ Interpret explained variance


Github Link: https://github.com/dotnetfullstackdeveloper/ai-engineer-journey/blob/main/Week-02-Machine-Learning/Day-15%20-%20PCA%20%2B%20Dimensionality%20Reduction


Post a Comment

0 Comments