Andrew Harris
Data acquisition consulting

I help companies acquire hard-to-get web data at scale.

Andrew Harris · Former ZoomInfo engineering leader · Vancouver, WA

Built and operated systems processing 1B+ pages and 500M+ search results per month. I work with AI startups, data companies, and founders who need crawl reliability, proprietary datasets, or a serious acquisition strategy - not generic data engineering.

Audit · $5,000 Proof-of-Value Build · from $15k Fractional Head of Data Acquisition · from $5k/mo
Currently onboarding new clients - reply within one business day

> replies within one business day

1B+
pages crawled per month
at ZoomInfo
500M+
SERP queries per month
through the engine I architected
80+
production endpoints live
across Nodesnack

Who I work with

AI startups that need proprietary data to make their model useful - RAG products, vertical AI platforms, research tools.

Data companies whose crawlers are failing, brittle, or expensive - sales intelligence, recruiting, market intelligence vendors.

Founders sitting on records but no strategy - millions of rows already, no path to expand the moat.

Agencies who need senior data acquisition expertise without a full-time hire.

What I'm not: a general software consultant, a Python contractor, a generic "AI consultant," or staff augmentation.

How to work with me

Three offers, ordered by depth. Most engagements start with the Audit.

Offer 1 · Lead engagement

Data Acquisition Audit

$5,000 · one-time · 20-30 page written report

A written review of your current (or proposed) web data acquisition stack. Built so you walk away with concrete architecture decisions, even if we never work together again.

  • Crawl architecture review - source coverage, scheduling, dedup, freshness, failure modes
  • Cost review - cost per acquired document, vendor spend, where you're overpaying
  • Anti-bot review - detection surface, proxy strategy, fingerprinting posture, sustainability
  • Infrastructure review - reliability, observability, on-call exposure
  • Risk review - legal, compliance, vendor concentration, single points of failure
  • Executive summary - what to fix this quarter, what to build next, what to stop spending on

The Audit is the natural first step. It earns the conversation that produces the next two offers.

Offer 2 · Build

Proof-of-Value Build

Fixed fee, typically $15k-$30k · scope-driven · defined deliverable

A focused, fixed-fee build of one critical piece of acquisition infrastructure. Scoped against your audit (or your existing roadmap) and delivered end-to-end.

  • Initial crawler implementation for a high-value source
  • Extraction pipeline with structured output for LLM ingestion
  • SERP acquisition system (the pattern behind 500M+ queries/month at ZoomInfo)
  • Competitor monitoring or market-signal pipeline
  • LLM ingestion workflow - markdown-native, embedding-ready

Fixed fee, written scope, defined "done." No open-ended hourly billing.

Offer 3 · Ongoing

Fractional Head of Data Acquisition

From $5,000 / month · judgment, not labor

A senior owner of your data acquisition strategy without the cost of a director-level full-time hire. Embedded enough to make decisions, not so embedded that I become a ticket-clearer.

  • Architecture reviews and design oversight
  • Vendor evaluation (proxies, scraping APIs, data brokers, infra)
  • Hiring support - JD design, technical interviews, calibration
  • Strategy: what to build internally, what to buy, what to retire
  • Monthly written briefing + ad-hoc async access

What I don't do: sprint work, ticket ownership, staff augmentation. Clients pay for judgment, not labor.

How we work together

  1. Written intake. You describe the data you need, what it costs you today, and the systems it feeds. 20 minutes, async, no call required.
  2. Scope within 48 hours. I send a written scope, deliverables, and start date - matched to the right offer.
  3. Start within the week. Audits typically deliver inside 2 weeks. Build engagements ship first artifacts in week one.

Pre-scoped feeds (productized menu)

Already know exactly what you need? These are productized, recurring data feeds built on the same infrastructure I deploy for clients. Each feed is scoped to an input list you provide - tickers, company names, SKUs, ZIP codes, counties, VINs, domains - whatever the source takes as a lookup. Avg delta and avg price reflect typical client volume; the real quote scales with list size and refresh cadence. Delivery is scheduled, normalized, and lands in S3, a webhook, or your database. If you're not sure which feed (or whether to build vs. buy), start with the Audit.

Source Category Monthly delivery Avg Delta Avg price
LinkedIn (public profiles)FirmographicsFor your company list - employee counts, org charts, hiring trends, tech-role mix, company-page updates~30k$1,550
LinkedIn JobsTalentFor your company list or role-filter set - open postings, seniority, skills, applicant counts~40k$1,550
AmazonRetailFor your ASIN, keyword, or category list - pricing, reviews, BSR, availability, seller buy-box~20k$1,550
SEC EDGARFinancialFor your ticker or CIK list - 10-K / 10-Q / 8-K filings, insider transactions, 13F holdings~3k$300
Google MapsLocalFor your location + keyword combos - POIs with hours, reviews, categories, popular times~10k$400
ZillowReal EstateFor your ZIP, address, or MLS-area list - listings, Zestimates, transaction history, price changes~10k$500
IndeedTalentFor your company or role-keyword list - postings, salaries, locations, posting age~20k$500
GlassdoorFirmographicsFor your company list - reviews, salaries, interview questions, CEO approval, benefits~8k$500
CrunchbaseFinancialFor your company or investor list - funding rounds, acquisitions, IPOs, board members~8k$450
WalmartRetailFor your SKU or category list - pricing, reviews, availability, pickup options by store~20k$1,550
County Assessor/Recorder 3,000+ countiesReal EstateFor your target counties - monthly delta of ownership, deeds, tax assessments, liens, mortgages~200k$2,800
Secretary of State 50 statesCorporateFor your entity list - filings, registered agents, annual reports, UCC records across all 50 SoS portals~50k$2,300
OFAC Sanctions (SDN)ComplianceDaily delta of the full SDN list - new and updated entries, alt names, vessels, IDs~300$250
PACERLegalFor your party, district, or docket-type filters - federal filings, opinions, bankruptcy records~5k$500
State Court Systems 50 statesLegalFor your party, docket-type, or county filters - civil and criminal records, judgments, liens, evictions~25k$2,000
NPI Registry (NPPES)HealthcareFor your NPI, name, or taxonomy filter - provider lookups with specialty, address, affiliations~10k$200
Booking.com / ExpediaTravelFor your property or destination + date-range combos - pricing, availability, reviews, cancellation policies~25k$2,100
BuiltWithFirmographicsFor your domain list - detected technology stack, stack-change deltas, category coverage~10k$400
Google NewsNewsFor your keyword or entity list - daily article aggregation across publishers, deduped and clustered~40k$800
Reddit (public subreddits)SentimentFor your keyword or subreddit list - posts, comments, sentiment, thread velocity~75k$800
FINRA BrokerCheckComplianceFor your CRD or broker-name list - registrations, disciplinary records, employment history~10k$450
Greenhouse / Lever / WorkableTalentFor your company list - open roles, team signals, hiring velocity from public ATS boards~10k$500
Custom careers pagesTalentFor your company list - bespoke scrape of any company's /careers site, including those not on a standard ATSper company$0.20–$3 / co.
X (Twitter)SentimentFor your keywords or account list - mentions, sentiment, engagement, influencer reach~100k$1,900
FDA (openFDA)HealthcareFor your drug or device list - approvals, adverse events, recalls, 510(k), inspections~5k$300
Yahoo FinanceFinancialFor your ticker list - quotes, historicals, fundamentals, analyst estimates, options chains~10k$600
Kayak / Google Flights / SkyscannerTravelFor your origin-destination + date pairs - airfare pricing, fare calendars, price-change tracking~30k$1,900
CoinGecko / CoinMarketCapCryptoFor your token list - price, volume, market cap, exchange-level data~15k$800
Etherscan / BscScan / PolygonscanCryptoFor your wallet or contract list - transactions, token transfers, gas, internal calls~100k$800
Pharmacy & Medical Boards 50 statesHealthcareFor your NPI or name list - license verification and disciplinary status across all 50 state boards~40k$2,200
State Attorney General ActionsComplianceDaily sweep of all 50 AG sites - new enforcement actions, data-breach notifications, settlements~2k$2,000
MarineTraffic / VesselFinderSupply ChainFor your vessel, IMO, or fleet list - positions, port calls, ETAs, voyage history~30k$800
State Procurement Portals 50 statesGov ContractsDaily sweep across all 50 state portals - new RFPs, contract awards, vendor registrations matching your keyword filters~40k$2,600
Port Authority Sites 50+ portsSupply ChainFor the ports you select - container volumes, congestion metrics, vessel schedules, berth assignments~20k$2,600
GoodRxHealthcareFor your drug + ZIP list - pharmacy-level prices, coupons, generic alternatives~15k$550
Carfax (public listings)AutomotiveFor your VIN list - history summaries, accident indicators, service records~10k$1,350
WayfairRetailFor your SKU or category list - pricing, reviews, availability, sale status~15k$1,350
Instacart / FreshDirectGroceryFor your SKU + ZIP list - grocery pricing and availability by store and retailer~25k$1,350
StockX / GOATResaleFor your product or SKU list - resale pricing, sales history, price premiums, size-level liquidity~20k$2,100
ShodanSecurityFor your IP-range, org, or product-string set - open ports, service banners, vulnerabilities~40k$800
NFT Marketplaces (OpenSea / Blur)CryptoFor your collection list - floor prices, volume, sales, rarity, holder concentration~30k$1,600
Glassnode (public)CryptoFor your asset list - on-chain metrics, exchange flows, miner data, HODL waves~10k$1,050
Dow Jones / World-Compliance WatchlistsComplianceFor your entity list - PEP flags, adverse media, watchlist screening with change tracking~100k$1,050
State Insurance Rate FilingsInsuranceFor your carrier or line-of-business filters - filed rates, forms, actuarial justifications, approval status~3k$1,750
FlightAware / FlightRadar24AviationFor your tail-number, route, or operator list - live flight data, delays, airport performance, history~60k$1,600
DAT Freight & AnalyticsLogisticsFor your origin-destination lane list - spot rates, rate trends, capacity signals~8k$1,050
Blind (public threads)SentimentFor your company or topic list - thread content, sentiment, compensation signals, layoff chatter~15k$1,050

Record volumes and prices are averages - both scale with your input list size and how often you want the feed refreshed. A few feeds (e.g. custom careers pages) are priced per unit rather than a monthly flat. Don't see the source you need? The same infrastructure handles ~any public web source - email for a tailored quote.

Common questions

Why start with the Audit?
Because most data acquisition problems are scoped wrong before they're built wrong. The Audit gives you a written architecture and cost review you can act on with or without me - and it makes any follow-on engagement vastly more accurate.
How fast can the Audit turn around?
Typical turnaround is 2 weeks from intake to delivered report. Tight timelines: mention it in the intake and I'll say upfront whether it's realistic.
Do you sign NDAs?
Yes. Standard mutual NDA, digitally signed before any discovery work. If you have your own template, send it with the intake.
Will the Fractional role cover my engineering team's tickets?
No. The Fractional role is judgment, not labor - architecture, vendor calls, hiring, strategy. If you need someone writing scrapers and clearing tickets, you need a different hire (and I'll tell you what to look for).
Do you work with non-AI companies?
Yes - any company whose business depends on hard-to-acquire web data. Sales intelligence, recruiting platforms, market intel vendors, compliance, OSINT. If you have a real acquisition problem, email me and I'll tell you honestly whether I'm the right fit.

About me

Currently leading dark-web and threat-intelligence data collection at Recorded Future. Before that, a decade at ZoomInfo scaling web acquisition from startup to IPO - SERP engines processing 500M+ queries/month, crawler frameworks handling 1B+ pages/month, agentic extraction on LLMs and domain-specific SLMs, and multi-vendor integration across 600M+ people and 100M+ companies. I run Nodesnack on the side - the same infrastructure I use for client engagements. See resume and portfolio for context.

Get in touch

Most engagements start with a written intake for the Data Acquisition Audit. Email andrew@abharrismethods.com.

The intake — 6 questions

The button above pre-fills these in your email client. Answer what you can; partial intakes are fine.

  1. 1
    You
    Name, role, company.
  2. 2
    What you're building
    And why web data matters to it.
  3. 3
    The acquisition problem
    What data you need, and what's broken or missing today.
  4. 4
    Current setup
    Nothing yet · DIY scrapers · vendor (which?) · inherited mess.
  5. 5
    What it's costing you today
    Manual hours, vendor spend, or product capability you can't ship.
  6. 6
    Scale & timing
    Sources, volume, refresh cadence, when you'd want to start.

> replies within one business day