Cursor Coding advanced

Python Data Pipeline Builder

Added Apr 1, 2026

Build a robust Python data pipeline that [PIPELINE_PURPOSE]. The pipeline should extract data from [DATA_SOURCE], transform it by [TRANSFORMATIONS], and load it into [DESTINATION]. Implement the following: proper logging with structured log messages, retry logic with exponential backoff for external API calls, data validation at each stage using Pydantic models, error handling that captures and reports failures without stopping the entire pipeline, configuration management using environment variables, and a dry-run mode for testing. Include type hints throughout, write docstrings for all public functions, and add unit tests for the transformation logic. Structure the code as a proper Python package with separate modules for extract, transform, and load stages.

Try in Cursor

About This Prompt

Data pipelines are the backbone of modern data infrastructure, but building reliable ones from scratch involves many subtle decisions around error handling, retry logic, and validation. This prompt generates a complete ETL pipeline with production-grade patterns built in from the start. The Pydantic validation ensures data quality at every stage, while the retry logic and structured logging make the pipeline resilient and debuggable. The modular package structure means you can extend and maintain the code as your data needs grow.

Variables to Customize


[PIPELINE_PURPOSE]

What the pipeline accomplishes

Example: syncs customer data from Salesforce to a PostgreSQL analytics database


[DATA_SOURCE]

Where data comes from

Example: Salesforce REST API


[TRANSFORMATIONS]

How data should be transformed

Example: deduplicating records, normalizing phone numbers, enriching with company size data, and calculating lead scores


[DESTINATION]

Where processed data goes

Example: PostgreSQL database with upsert logic

Tips for Best Results

Specify your Python version for compatible syntax suggestions
Mention your orchestration tool (Airflow, Prefect, etc.) for integration-ready code
Request a Dockerfile and docker-compose.yml for containerized deployment

Example Output

```python
# pipeline/extract/salesforce.py
from typing import AsyncIterator
from pydantic import BaseModel
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential

class SalesforceContact(BaseModel):
    id: str
    email: str
    first_name: str
    last_name: str
    company: str | None = None
    phone: str | None = None

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=30))
async def fetch_contacts(config: SalesforceConfig) -> AsyncIterator[SalesforceContact]:
    ...
```

python ETL data-pipeline pydantic automation