Ojasa Mirai

Python

Learning Level

Data Processing Overview CSV Data Handling Pandas Basics DataFrames Data Filtering Aggregation & Grouping Data Cleaning & Wrangling NumPy Arrays Data Visualization Basics

Python/Data Processing/Pandas Basics

🐼 Pandas Basics — Your Data Analysis Toolkit

Pandas is Python's primary data manipulation library. Learn the fundamentals to start working with data effectively.

🎯 What is Pandas?

Pandas provides easy-to-use data structures and tools for data analysis. Two main structures: Series and DataFrames.

import pandas as pd

# Series: 1D array with labels
ages = pd.Series([25, 30, 28, 35], index=['Alice', 'Bob', 'Carol', 'David'])
print(ages)
print(ages['Alice'])  # Access by label

# DataFrame: 2D table with labeled rows and columns
data = {
    'name': ['Alice', 'Bob', 'Carol'],
    'age': [25, 30, 28],
    'city': ['New York', 'London', 'Paris']
}
df = pd.DataFrame(data)
print(df)

📦 Installation and Import

# Install pandas (run once)
# pip install pandas

# Import pandas
import pandas as pd

# Check version
print(pd.__version__)

🔄 Creating DataFrames

import pandas as pd

# From dictionary
df1 = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Carol'],
    'age': [25, 30, 28]
})

# From list of lists
df2 = pd.DataFrame(
    [['Alice', 25], ['Bob', 30], ['Carol', 28]],
    columns=['name', 'age']
)

# From CSV file
df3 = pd.read_csv('data.csv')

# From dictionary of lists
df4 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

🔍 Exploring DataFrames

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Carol', 'David'],
    'age': [25, 30, 28, 35],
    'salary': [50000, 60000, 55000, 70000]
})

# Basic information
print(df.shape)           # (4, 3) - rows, columns
print(df.columns)         # Column names
print(df.index)           # Row indices
print(df.info())          # Data types and info

# First and last rows
print(df.head())          # First 5 rows
print(df.head(2))         # First 2 rows
print(df.tail())          # Last 5 rows

# Statistical summary
print(df.describe())      # Mean, std, min, max, etc.

📊 Accessing Data

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Carol'],
    'age': [25, 30, 28],
    'city': ['New York', 'London', 'Paris']
})

# Access column (returns Series)
print(df['name'])
print(df['age'].mean())

# Access row by position
print(df.iloc[0])         # First row

# Access row by label
df.index = ['person1', 'person2', 'person3']
print(df.loc['person1'])

# Access specific cell
print(df.loc['person1', 'name'])      # 'Alice'
print(df.iloc[0, 0])                  # First row, first column

# Multiple columns
print(df[['name', 'age']])

✏️ Modifying DataFrames

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'age': [25, 30]
})

# Add new column
df['city'] = ['New York', 'London']

# Add new row
new_person = pd.DataFrame({'name': ['Carol'], 'age': [28], 'city': ['Paris']})
df = pd.concat([df, new_person], ignore_index=True)

# Modify column
df['age'] = df['age'] + 1

# Drop column
df = df.drop('city', axis=1)

# Rename column
df = df.rename(columns={'name': 'full_name'})

🎨 Real-World Example: Student Grades Analysis

import pandas as pd

# Create DataFrame
grades_data = {
    'student': ['Alice', 'Bob', 'Carol', 'David'],
    'math': [92, 78, 95, 88],
    'english': [88, 85, 91, 82],
    'science': [95, 80, 93, 87]
}

df = pd.DataFrame(grades_data)

# Calculate average per student
df['average'] = (df['math'] + df['english'] + df['science']) / 3

# Find top performer
top_student = df.loc[df['average'].idxmax()]
print(f"Top student: {top_student['student']} ({top_student['average']:.2f})")

# Count high performers (>90)
high_performers = df[df['average'] > 90]
print(f"Students with average > 90: {len(high_performers)}")

# Subject averages
print(f"Average math: {df['math'].mean():.2f}")
print(f"Average english: {df['english'].mean():.2f}")

📈 Basic Statistics

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50]
})

# Descriptive statistics
print(df.mean())          # Mean of each column
print(df.median())        # Median
print(df.std())           # Standard deviation
print(df.sum())           # Sum
print(df.min())           # Minimum
print(df.max())           # Maximum

# For specific column
print(df['A'].mean())
print(df['B'].sum())

📊 Data Types

Type	Description	Example
int64	Integer	42
float64	Decimal number	3.14
object	String/text	"Alice"
bool	True/False	True
datetime64	Date/time	2024-01-01

🔑 Key Takeaways

✅ Pandas provides Series (1D) and DataFrames (2D) for data handling

✅ Use `head()` and `info()` to explore data structure

✅ Access columns with `df['column']` and rows with `loc`/`iloc`

✅ Use vectorized operations for efficient calculations

✅ DataFrame methods handle null values and basic statistics

Continue learning: DataFrames | Data Filtering

Resources

Python Docs

Ojasa Mirai

Master AI-powered development skills through structured learning, real projects, and verified credentials. Whether you're upskilling your team or launching your career, we deliver the skills companies actually need.

Learn Deep • Build Real • Verify Skills • Launch Forward

Courses

Python Fastapi ReactJS Cloud

Resources

Blog & Articles GitHub Projects Video Tutorials

Ecosystem

Ojasa Mirai Site My Growth Learning Portal Community Discord

Twitter GitHub LinkedIn