Thursday, November 13, 2025

πŸ“˜ Pandas in Python – A Complete Guide from Easy to Advanced

 Data is the heart of every modern technology—from AI and ML to analytics and automation.

In Python, the most powerful and widely-used library for handling data is Pandas. If you are a beginner who wants to master data manipulation, this blog will take you from basic to advanced step by step.


🟒 1. Introduction to Pandas

Pandas is an open-source Python library used for:

  • Handling structured data

  • Cleaning data

  • Analyzing data

  • Reading/writing CSV and Excel files

  • Performing mathematical and statistical operations

Pandas is built on top of NumPy and works extremely well with Matplotlib, Seaborn, and Machine Learning libraries.


πŸ› ️ 2. Installing Pandas

Use pip:

pip install pandas

Then import it:

import pandas as pd

🟒 3. Core Data Structures in Pandas

Pandas has two main structures:

Series

A one-dimensional labeled array (like a column).

import pandas as pd s = pd.Series([10, 20, 30]) print(s)

DataFrame

A two-dimensional table (like an Excel sheet or SQL table).

df = pd.DataFrame({ "Name": ["Raghav", "Aman"], "Age": [20, 22] }) print(df)

πŸ”΅ 4. Reading and Writing Files

Pandas makes it extremely easy to import and export data.

✔ Read CSV

df = pd.read_csv("data.csv")

✔ Write CSV

df.to_csv("output.csv", index=False)

✔ Read Excel

df = pd.read_excel("file.xlsx")

✔ Write Excel

df.to_excel("output.xlsx", index=False)

🟑 5. Basic Operations in Pandas

✔ View top/bottom rows

df.head() df.tail()

✔ View basic info

df.info() df.describe()

✔ Selecting Columns

df['Name'] df[['Name', 'Age']]

✔ Selecting Rows (loc & iloc)

df.loc[0] # by label df.iloc[0] # by index number

🟣 6. Data Cleaning with Pandas (Very Important)

Data cleaning is the most important step in real-world projects.

✔ Handling Missing Values

df.isnull() df.dropna() # remove missing rows df.fillna(0) # fill with value

✔ Removing Duplicates

df.drop_duplicates()

✔ Changing data types

df['Age'] = df['Age'].astype(int)

✔ Renaming Columns

df.rename(columns={"Name": "FullName"}, inplace=True)

πŸ”΅ 7. Filtering & Conditional Selection

df[df["Age"] > 18] df[(df["Age"] > 18) & (df["City"] == "Delhi")]

🟠 8. Sorting Data

df.sort_values("Age") df.sort_values(["City", "Age"], ascending=[True, False])

πŸ”΅ 9. Adding & Removing Columns

df["Marks"] = [85, 90] # add column df.drop("Marks", axis=1) # remove column

🟣 10. Grouping & Aggregation (Medium → Hard)

Grouping helps summarize data.

⭐ Group by a column

df.groupby("City")["Age"].mean()

⭐ Multiple aggregations

df.groupby("City").agg({"Age": ["mean", "max", "min"]})

🟣 11. Merging, Joining & Concatenation (Hard)

✔ Merge (like SQL INNER JOIN)

pd.merge(df1, df2, on="ID")

✔ Join (index-based join)

df1.join(df2)

✔ Concatenate (stack data)

pd.concat([df1, df2])

πŸ”΄ 12. Working with Date & Time (Advanced)

df["Date"] = pd.to_datetime(df["Date"]) df["Year"] = df["Date"].dt.year df["Month"] = df["Date"].dt.month

πŸ”΄ 13. Pivot Tables (Advanced)

Works like Excel pivot tables.

df.pivot_table(values="Sales", index="City", columns="Product", aggfunc="sum")

πŸ”΄ 14. Visualizing Data with Pandas (Easy Integration)

Pandas integrates with Matplotlib:

df["Age"].plot(kind="bar") plt.show()

🟣 15. Real-World Example: Mini Data Analysis

df = pd.read_csv("sales.csv") # Clean df.dropna(inplace=True) # Analyze total = df["Amount"].sum() top_city = df.groupby("City")["Amount"].sum().idxmax() print("Total Sales:", total) print("Top City:", top_city)

🏁 Conclusion

Pandas is one of the most essential libraries for anyone working with data.
In this blog, you covered Pandas from easy to advanced topics, including:

  • Series and DataFrames

  • Reading/writing CSV & Excel

  • Data cleaning

  • Filtering, sorting, grouping

  • Merge/join/concat

  • Dates, pivot tables

  • Visualization

Once you master these basics, you are ready for:

✔ Data Science
✔ Machine Learning
✔ Big Data Analysis
✔ Real-world projects

No comments:

Post a Comment

Python Viva Questions

  Basic Python Viva Questions 1. What is Python? Python is a high-level, interpreted, and object-oriented programming language used for w...