Welcome to Day-3 – NumPy + Pandas (Real Data Handling Begins).
Today is very important because AI/ML is mostly about data.
Think of it like this:
Backend developer → works with APIs
-
Database engineer → works with tables
-
AI engineer → works with datasets
Today we’ll learn the two most important Python libraries:
NumPy → numerical computing
-
Pandas → table-like data analysis
NumPy works like a high-performance array engine for Python.
Example 1 – Create Array
In Colab run:
import numpy as np
numbers = np.array([10, 20, 30, 40, 50])
print(numbers)
Output:
[10 20 30 40 50]
Example 2 – Basic Operations
Run:
print("Mean:", np.mean(numbers))
print("Max:", np.max(numbers))
print("Min:", np.min(numbers))
print("Sum:", np.sum(numbers))
This is how ML libraries process numerical data.
Mean (average) is one of the most common statistics in data analysis:
mean = (x1 + x2 + ... + xn) / n
In AI, averages like this are used for:
model evaluation
-
normalization
-
feature engineering
Example 3 – Vector Operations
Run:
a = np.array([1,2,3])
b = np.array([4,5,6])
print(a + b)
print(a * b)
Output:
[5 7 9]
[4 10 18]
This vectorized computation is why NumPy is powerful.
🧠 Part 2 – Pandas (Working With Data Tables)
Pandas is like Excel + SQL inside Python.
Step 1 – Import Pandas
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"Salary": [50000, 60000, 70000]
}
df = pd.DataFrame(data)
print(df)
This structure is called a DataFrame.
Think of it like:
SQL Table
OR
Excel Sheet
Step 3 – Basic Data Exploration
Run:
print(df.head())
Shows first rows.
Step 4 – Get Column
print(df["Salary"])
Step 5 – Calculate Statistics
print(df["Salary"].mean())
🧠 Part 3 – Filtering Data (Very Important)
Filter like SQL WHERE clause.
high_salary = df[df["Salary"] > 55000]
print(high_salary)
Equivalent SQL:
SELECT * FROM employees
WHERE salary > 55000
🧠 Part 4 – Load CSV Dataset
AI usually starts with CSV datasets.
There are multiple ways to upload csv file. Find below:
✅ Method 1 – Upload CSV from Your Laptop (Easiest):
Step 1: Run this in Colab
from google.colab import files
uploaded = files.upload()
👉 It will open file picker → select your .csv file
Step 2: Read CSV using Pandas
import pandas as pd
df = pd.read_csv("your_file_name.csv")
print(df.head())
Note: Use exact file name (case-sensitive)
🧠 Example
If your file is: Test.csv then
df = pd.read_csv("test.csv")
df.head()
✅ Method 2 – Upload from Left Sidebar (UI Way)
In Colab:
-
Left side → Click folder icon 📁
-
Click Upload
-
Select CSV file
-
File appears in
/content/
Then:
df = pd.read_csv("/content/test.csv")
✅ Method 3 – Load from URL (Advanced)
If dataset is online:
url = "https://example.com/data.csv"
df = pd.read_csv(url)
df.head()
🔍 Useful Commands After Loading
df.head() # First 5 rows
df.tail() # Last 5 rows
df.info() # Structure of data
df.describe() # Statistics
df.columns # Column names
Example:
df = pd.read_csv("data.csv")
In Colab you can upload file from sidebar.
Then explore:
df.head()
df.info()
df.describe()
🎯 Mini Practice (Do This)
Create dataset:
"Name": ["A", "B", "C", "D"],
"Marks": [70, 85, 90, 60],
"Age": [20, 21, 19, 22]
}
df = pd.DataFrame(data)
Now try:
1️⃣ Print students with Marks > 80
2️⃣ Find average marks
3️⃣ Find maximum age
🎯 End of Day-3 Goals
You should now understand:
✅ NumPy arrays
✅ Basic statistics
✅ Pandas DataFrame
✅ Filtering datasets
✅ Loading data
These are core AI data skills.
💡 Important insight for developers:
Most AI projects spend 70–80% time on data processing, not model building.
So mastering Pandas early is a huge advantage.
0 Comments
If you have any queries, please let me know. Thanks.