Andrew Harris
Portfolio

Andrew Harris

Selected work · Data Acquisition & Engineering

Open-source projects demonstrating expertise in web data acquisition, agentic AI systems, and full-stack development.

7 projects Scraping · RAG · Full-stack Python · TypeScript

Nodesnack

Live

Production web data API with 80+ endpoints spanning 13 platforms. Unified REST interface for scraping, search, and structured data extraction - built on the same proxy, rate-limit, and anti-bot infrastructure I deploy for client engagements.

The production backbone. Clients who need vetted endpoints consume Nodesnack directly; clients who need custom coverage get bespoke endpoints built on the same stack.

REST API Production Multi-platform Proxy Infrastructure Rate Limiting Anti-bot

Ransometry

Live

Open-web ransomware and extortion monitor. Polls ground-truth leak-site feeds, SEC EDGAR 8-Ks, and curated researcher posts at sub-minute cadence, correlates multi-source signal into incidents, and surfaces a live Mercator map, alert feed, and 345-gang catalogue.

Production example of real-time OSINT ingestion with LLM enrichment - same pattern I deploy when a client needs continuous threat or market signal stitched across dozens of noisy feeds.

Next.js Supabase Render Worker RSS · Mastodon · Bluesky SEC EDGAR Anthropic Haiku

Uktena

Live

Universal web scraper with a 3-tier hybrid fallback strategy for maximum reliability. Combines stealth browser automation with residential proxy rotation and intelligent request fingerprinting. Designed to handle sites with aggressive bot detection while maintaining ethical scraping practices and rate limits.

The playbook I reach for when a target site has serious anti-bot defenses - lets us commit to a delivery date without gambling on one fragile technique.

Python Playwright Stealth Browser Residential Proxies Fingerprinting

Courtwise

Agentic RAG platform designed to make legal information accessible. Combines autonomous web crawling with semantic search to index and query legal documents. Uses LLM-powered extraction to identify relevant statutes, case law, and procedural information, then surfaces answers through a conversational interface.

End-to-end reference for AI clients wanting agentic extraction: crawl → chunk → embed → retrieve → generate, with cost controls at each step.

Python LangChain RAG Vector DB LLM Semantic Search Web Crawling

Catchmark

Live

Full-stack fishing spot discovery platform for anglers. React/TypeScript frontend with interactive maps, Express.js backend API, and PostGIS-powered geospatial queries for location-based search. Demonstrates modern full-stack architecture with spatial data handling.

Proof I ship end-to-end, not just data pipelines - when a client needs delivery through a UI or API, not just a CSV drop.

React TypeScript Express.js PostgreSQL PostGIS Geospatial

Serpopotamus

Enterprise-grade Google Search and Maps SERP scraping platform. Features a Flask-based web UI for managing scrape jobs, PostgreSQL for persistent storage, intelligent rate limiting and proxy rotation, and batch CSV processing for bulk operations.

Same architectural patterns behind ZoomInfo's 500M+ query/month SERP engine - the scaffolding I reuse when a client needs Google Search or Maps data.

Python Flask PostgreSQL SERP API Rate Limiting Proxy Management

Yelptopus

Yelp business data extraction platform with deep pagination support. Features a Flask web UI for configuring and monitoring scrape jobs, PostgreSQL storage with deduplication, and real-time progress tracking. Handles Yelp's pagination limits through intelligent cursor management.

Pattern I reuse whenever a client needs directory-style data with strict pagination caps - cursor strategy, dedup, and resumability baked in.

Python Flask PostgreSQL Deep Pagination Real-time Tracking