Job description:

We are looking for a highly skilled Python Engineer with deep experience building large-scale web scraping pipelines and AI-powered data processing systems. This role is focused on extracting, normalizing, and enriching large volumes of structured and unstructured data using Python, LLMs (e.g., OpenAI), and AWS-based containerized infrastructure.

You will own the end-to-end lifecycle of data ingestion: from scraping and document processing, through AI-driven enrichment and classification, to deployment in scalable cloud environments.

Key Responsibilities

Design, build, and maintain high-reliability Python scraping systems for collecting data from complex, dynamic, and unstructured web sources (HTML, PDFs, APIs, documents).
Implement AI-assisted extraction, classification, summarization, and normalization pipelines using large language models (e.g., OpenAI).
Develop resilient scraping architectures with rate-limiting, retries, proxy management, CAPTCHA handling, and change detection.
Build data processing pipelines that clean, transform, deduplicate, and enrich scraped content for downstream analytics and ML workflows.
Develop and maintain containerized Python services using Docker and deploy them at scale via AWS ECS and related services.
Integrate LLMs into automated workflows for document parsing, entity extraction, taxonomy mapping, and insight generation.
Design and expose internal APIs for triggering scraping jobs, processing data, and retrieving AI-generated outputs.
Manage cloud resources across AWS (ECS, S3, Lambda, RDS, CloudWatch) with a focus on scalability, reliability, and cost efficiency.
Optimize scraping and AI pipelines for performance, throughput, and fault tolerance.
Implement monitoring, logging, and alerting for long-running scraping and AI workloads.
Write clear technical documentation covering scraping logic, AI workflows, and deployment patterns.

Qualifications

Strong Python engineering background with a focus on data ingestion and scraping systems.
Extensive experience building web scrapers using tools such as BeautifulSoup, Scrapy, Playwright, Selenium, or similar frameworks.
Hands-on experience integrating LLM APIs (e.g., OpenAI) into production systems.
Proven ability to handle unstructured data (HTML, PDFs, text blobs) and convert it into structured outputs.
Experience containerizing Python applications with Docker and deploying them using AWS ECS.
Solid understanding of AWS services including S3, ECS, Lambda, RDS, and CloudWatch.
Experience designing and consuming RESTful APIs.
Familiarity with CI/CD pipelines, Git-based workflows, and automated testing.
Strong grasp of software engineering best practices: modularity, observability, error handling, and performance optimization.
Ability to work independently in ambiguous problem spaces and iterate quickly.
Clear written and verbal communication skills, especially around complex technical systems.

Pay: E£27,500.00 - E£35,000.00 per month

Work Location: Remote

Tags & focus areas

Used for matching and alerts on DevFound

Fulltime Remote Ai Generative Ai

Python - AI Scraping engineer

Tags & focus areas

Ready to Join the Team?