Pandas Data Cleaning Script for CSVs

2026-05-05 12:17:25+08

Data cleaning takes up 80% of a data scientist's time. This prompt automates the repetitive parts of cleaning a new dataset using the Python Pandas library.

The Core Prompt

Write a Python script using Pandas to clean a CSV file named "data.csv". The script should: 1. Remove duplicate rows. 2. Fill missing values in the "price" column with the median. 3. Convert the "date" column to datetime objects. 4. Remove any rows where "email" is invalid.

The Logic

By listing specific steps, you ensure the script is modular and easy to test against your specific data problems.

Usage Tips

  • Explain Why: Ask the AI to "add comments explaining why we use the median instead of the mean for the price."
  • Visualization: Add a step to "generate a simple histogram of the cleaned prices."

Example AI Output

df['price'] = df['price'].fillna(df['price'].median())