Python Data Pipeline Builder
Added Apr 1, 2026
About This Prompt
Data pipelines are the backbone of modern data infrastructure, but building reliable ones from scratch involves many subtle decisions around error handling, retry logic, and validation. This prompt generates a complete ETL pipeline with production-grade patterns built in from the start. The Pydantic validation ensures data quality at every stage, while the retry logic and structured logging make the pipeline resilient and debuggable. The modular package structure means you can extend and maintain the code as your data needs grow.
Variables to Customize
[PIPELINE_PURPOSE]
What the pipeline accomplishes
Example: syncs customer data from Salesforce to a PostgreSQL analytics database
[DATA_SOURCE]
Where data comes from
Example: Salesforce REST API
[TRANSFORMATIONS]
How data should be transformed
Example: deduplicating records, normalizing phone numbers, enriching with company size data, and calculating lead scores
[DESTINATION]
Where processed data goes
Example: PostgreSQL database with upsert logic
Tips for Best Results
- Specify your Python version for compatible syntax suggestions
- Mention your orchestration tool (Airflow, Prefect, etc.) for integration-ready code
- Request a Dockerfile and docker-compose.yml for containerized deployment
Example Output
```python
# pipeline/extract/salesforce.py
from typing import AsyncIterator
from pydantic import BaseModel
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential
class SalesforceContact(BaseModel):
id: str
email: str
first_name: str
last_name: str
company: str | None = None
phone: str | None = None
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=30))
async def fetch_contacts(config: SalesforceConfig) -> AsyncIterator[SalesforceContact]:
...
```