Back to Workshop

AgriData Pipeline

Senior Data Engineer
2023

An ETL pipeline processing terabytes of multi-spectral satellite imagery to monitor crop health, predict yields, and alert smallholder farmers of potential pest infestations before they become visible to the naked eye.

Context

Smallholder farmers in Kenya lose up to 40% of their yield to pests and diseases. Early detection is crucial, but satellite data is noisy, expensive, and hard to process at scale.

The Engineering

My role was to make the data pipeline reliable and cost-effective.

  • Orchestration: Used Airflow DAGs to coordinate the retrieval of imagery from Planet Labs.
  • Processing: Normalized different satellite bands (NDVI, EVI) using Cloud Functions (serverless) to handle burst loads during satellite passes without paying for idle compute.
  • Data Warehousing: Structured the geospatial data in BigQuery to allow agronomy researchers to run SQL queries over "maps" effectively.

// Sample BQ Query logic


SELECT 
farm_id, 
ST_ClusterDBSCAN(geometry, 50, 2) OVER () AS cluster_id,
AVG(ndvi_mean) as health_index
FROM `agri_data.satellite_reads`
WHERE read_date BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY farm_id, geometry

Result

The pipeline reduced data latency from 2 weeks to 48 hours, allowing interventions to happen in near real-time. It currently monitors over 14,000 hectares of maize and coffee farms.

Tech Stack

Apache AirflowGoogle Cloud PlatformBigQueryPlanet Labs APITerraform

Links

Connect

Friday Chai in Nairobi

I dedicate Friday afternoons to deep dives, and small talk. If you're in town, let's geek out over distributed systems, civic tech, or why you love homesteading... or just show me what you're building.

chai@emmanuelallan.com to grab a slot.

→ Usually at Røst or Klaus

📩

For everything else: hello@emmanuelallan.com