Thursday, December 11, 2025

COVID-19 Dataset Analysis Project

 

COVID-19 Dataset Analysis Project (Using Kaggle Dataset)

(Python · Pandas · Matplotlib)


1. Project Title

COVID-19 Data Analysis Using Python (Exploratory Data Analysis on Global COVID Statistics)


2. Dataset Source

Download from Kaggle:
“Novel Corona Virus 2019 Dataset” or
“COVID-19 World Vaccination Progress”
Search on Kaggle → Download CSV.

You will mainly use:

  • covid_19_data.csv

  • time_series_covid_19_confirmed.csv

  • time_series_covid_19_deaths.csv

  • time_series_covid_19_recovered.csv


3. Project Objectives

✔ Understand global COVID-19 spread through data
✔ Identify most affected countries
✔ Visualize daily vs cumulative cases
✔ Analyze death & recovery trends
✔ Explore correlation between features
✔ Perform country-wise time-series analysis


4. Python Libraries Required

pip install pandas matplotlib seaborn numpy

5. Import Libraries

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns

6. Load Dataset

df = pd.read_csv("covid_19_data.csv") df.head()

7. Basic Data Information

df.info() df.isnull().sum() df.describe()

8. Data Cleaning

Rename messy column names:

df.rename(columns={ 'Country/Region': 'Country', 'Province/State': 'State', 'ObservationDate': 'Date' }, inplace=True)

Convert date:

df['Date'] = pd.to_datetime(df['Date'])

Fill missing values:

df['State'] = df['State'].fillna('')

9. Global Numbers Overview

Total Confirmed, Deaths, Recovered

total_confirmed = df['Confirmed'].sum() total_deaths = df['Deaths'].sum() total_recovered = df['Recovered'].sum() print(total_confirmed, total_deaths, total_recovered)

10. Country-wise Summary

country_summary = df.groupby("Country")[["Confirmed","Deaths","Recovered"]].max().sort_values("Confirmed", ascending=False) country_summary.head(10)

11. Plot: Top 10 Countries by Confirmed Cases

top10 = country_summary.head(10) plt.figure(figsize=(10,6)) plt.bar(top10.index, top10['Confirmed']) plt.xticks(rotation=45) plt.title("Top 10 Countries with Highest Confirmed Cases") plt.xlabel("Country") plt.ylabel("Confirmed Cases") plt.show()

12. Death Rate and Recovery Rate

country_summary["Death_Rate"] = (country_summary["Deaths"] / country_summary["Confirmed"]) * 100 country_summary["Recovery_Rate"] = (country_summary["Recovered"] / country_summary["Confirmed"]) * 100 country_summary.head()

13. Heatmap (Correlation)

plt.figure(figsize=(6,4)) sns.heatmap(df[['Confirmed','Deaths','Recovered']].corr(), annot=True) plt.title("Correlation Between COVID-19 Metrics") plt.show()

14. Time Series Analysis (Global Trend)

global_daily = df.groupby("Date")[["Confirmed","Deaths","Recovered"]].sum() plt.figure(figsize=(12,6)) plt.plot(global_daily.index, global_daily["Confirmed"], label="Confirmed") plt.plot(global_daily.index, global_daily["Deaths"], label="Deaths") plt.plot(global_daily.index, global_daily["Recovered"], label="Recovered") plt.legend() plt.title("Global Trend of COVID-19 Cases Over Time") plt.xlabel("Date") plt.ylabel("Cases") plt.show()

15. Country-Specific Analysis (e.g., India)

india = df[df["Country"]=="India"].groupby("Date")[["Confirmed","Deaths","Recovered"]].sum() plt.figure(figsize=(10,5)) plt.plot(india.index, india["Confirmed"], label="Confirmed") plt.plot(india.index, india["Deaths"], label="Deaths") plt.plot(india.index, india["Recovered"], label="Recovered") plt.title("COVID-19 Trend in India") plt.legend() plt.show()

16. Key Findings (Write in Report)

✔ USA, India, Brazil were most affected
✔ Confirmed cases show exponential growth in early months
✔ High correlation between confirmed cases and deaths
✔ Death rate varies per country (health system differences)
✔ Recovery rate increased after vaccination rollout
✔ India shows rapid growth during second wave

No comments:

Post a Comment

Python Viva Questions

  Basic Python Viva Questions 1. What is Python? Python is a high-level, interpreted, and object-oriented programming language used for w...