Who I work with
AI startups that need proprietary data to make their model useful - RAG products, vertical AI platforms, research tools.
Data companies whose crawlers are failing, brittle, or expensive - sales intelligence, recruiting, market intelligence vendors.
Founders sitting on records but no strategy - millions of rows already, no path to expand the moat.
Agencies who need senior data acquisition expertise without a full-time hire.
What I'm not: a general software consultant, a Python contractor, a generic "AI consultant," or staff augmentation.
How to work with me
Three offers, ordered by depth. Most engagements start with the Audit.
Data Acquisition Audit
A written review of your current (or proposed) web data acquisition stack. Built so you walk away with concrete architecture decisions, even if we never work together again.
- Crawl architecture review - source coverage, scheduling, dedup, freshness, failure modes
- Cost review - cost per acquired document, vendor spend, where you're overpaying
- Anti-bot review - detection surface, proxy strategy, fingerprinting posture, sustainability
- Infrastructure review - reliability, observability, on-call exposure
- Risk review - legal, compliance, vendor concentration, single points of failure
- Executive summary - what to fix this quarter, what to build next, what to stop spending on
The Audit is the natural first step. It earns the conversation that produces the next two offers.
Proof-of-Value Build
A focused, fixed-fee build of one critical piece of acquisition infrastructure. Scoped against your audit (or your existing roadmap) and delivered end-to-end.
- Initial crawler implementation for a high-value source
- Extraction pipeline with structured output for LLM ingestion
- SERP acquisition system (the pattern behind 500M+ queries/month at ZoomInfo)
- Competitor monitoring or market-signal pipeline
- LLM ingestion workflow - markdown-native, embedding-ready
Fixed fee, written scope, defined "done." No open-ended hourly billing.
Fractional Head of Data Acquisition
A senior owner of your data acquisition strategy without the cost of a director-level full-time hire. Embedded enough to make decisions, not so embedded that I become a ticket-clearer.
- Architecture reviews and design oversight
- Vendor evaluation (proxies, scraping APIs, data brokers, infra)
- Hiring support - JD design, technical interviews, calibration
- Strategy: what to build internally, what to buy, what to retire
- Monthly written briefing + ad-hoc async access
What I don't do: sprint work, ticket ownership, staff augmentation. Clients pay for judgment, not labor.
How we work together
- Written intake. You describe the data you need, what it costs you today, and the systems it feeds. 20 minutes, async, no call required.
- Scope within 48 hours. I send a written scope, deliverables, and start date - matched to the right offer.
- Start within the week. Audits typically deliver inside 2 weeks. Build engagements ship first artifacts in week one.
Pre-scoped feeds (productized menu)
Already know exactly what you need? These are productized, recurring data feeds built on the same infrastructure I deploy for clients. Each feed is scoped to an input list you provide - tickers, company names, SKUs, ZIP codes, counties, VINs, domains - whatever the source takes as a lookup. Avg delta and avg price reflect typical client volume; the real quote scales with list size and refresh cadence. Delivery is scheduled, normalized, and lands in S3, a webhook, or your database. If you're not sure which feed (or whether to build vs. buy), start with the Audit.
| Source | Category | Monthly delivery | Avg Delta | Avg price |
|---|---|---|---|---|
| LinkedIn (public profiles) | Firmographics | For your company list - employee counts, org charts, hiring trends, tech-role mix, company-page updates | ~30k | $1,550 |
| LinkedIn Jobs | Talent | For your company list or role-filter set - open postings, seniority, skills, applicant counts | ~40k | $1,550 |
| Amazon | Retail | For your ASIN, keyword, or category list - pricing, reviews, BSR, availability, seller buy-box | ~20k | $1,550 |
| SEC EDGAR | Financial | For your ticker or CIK list - 10-K / 10-Q / 8-K filings, insider transactions, 13F holdings | ~3k | $300 |
| Google Maps | Local | For your location + keyword combos - POIs with hours, reviews, categories, popular times | ~10k | $400 |
| Zillow | Real Estate | For your ZIP, address, or MLS-area list - listings, Zestimates, transaction history, price changes | ~10k | $500 |
| Indeed | Talent | For your company or role-keyword list - postings, salaries, locations, posting age | ~20k | $500 |
| Glassdoor | Firmographics | For your company list - reviews, salaries, interview questions, CEO approval, benefits | ~8k | $500 |
| Crunchbase | Financial | For your company or investor list - funding rounds, acquisitions, IPOs, board members | ~8k | $450 |
| Walmart | Retail | For your SKU or category list - pricing, reviews, availability, pickup options by store | ~20k | $1,550 |
| County Assessor/Recorder 3,000+ counties | Real Estate | For your target counties - monthly delta of ownership, deeds, tax assessments, liens, mortgages | ~200k | $2,800 |
| Secretary of State 50 states | Corporate | For your entity list - filings, registered agents, annual reports, UCC records across all 50 SoS portals | ~50k | $2,300 |
| OFAC Sanctions (SDN) | Compliance | Daily delta of the full SDN list - new and updated entries, alt names, vessels, IDs | ~300 | $250 |
| PACER | Legal | For your party, district, or docket-type filters - federal filings, opinions, bankruptcy records | ~5k | $500 |
| State Court Systems 50 states | Legal | For your party, docket-type, or county filters - civil and criminal records, judgments, liens, evictions | ~25k | $2,000 |
| NPI Registry (NPPES) | Healthcare | For your NPI, name, or taxonomy filter - provider lookups with specialty, address, affiliations | ~10k | $200 |
| Booking.com / Expedia | Travel | For your property or destination + date-range combos - pricing, availability, reviews, cancellation policies | ~25k | $2,100 |
| BuiltWith | Firmographics | For your domain list - detected technology stack, stack-change deltas, category coverage | ~10k | $400 |
| Google News | News | For your keyword or entity list - daily article aggregation across publishers, deduped and clustered | ~40k | $800 |
| Reddit (public subreddits) | Sentiment | For your keyword or subreddit list - posts, comments, sentiment, thread velocity | ~75k | $800 |
| FINRA BrokerCheck | Compliance | For your CRD or broker-name list - registrations, disciplinary records, employment history | ~10k | $450 |
| Greenhouse / Lever / Workable | Talent | For your company list - open roles, team signals, hiring velocity from public ATS boards | ~10k | $500 |
| Custom careers pages | Talent | For your company list - bespoke scrape of any company's /careers site, including those not on a standard ATS | per company | $0.20–$3 / co. |
| X (Twitter) | Sentiment | For your keywords or account list - mentions, sentiment, engagement, influencer reach | ~100k | $1,900 |
| FDA (openFDA) | Healthcare | For your drug or device list - approvals, adverse events, recalls, 510(k), inspections | ~5k | $300 |
| Yahoo Finance | Financial | For your ticker list - quotes, historicals, fundamentals, analyst estimates, options chains | ~10k | $600 |
| Kayak / Google Flights / Skyscanner | Travel | For your origin-destination + date pairs - airfare pricing, fare calendars, price-change tracking | ~30k | $1,900 |
| CoinGecko / CoinMarketCap | Crypto | For your token list - price, volume, market cap, exchange-level data | ~15k | $800 |
| Etherscan / BscScan / Polygonscan | Crypto | For your wallet or contract list - transactions, token transfers, gas, internal calls | ~100k | $800 |
| Pharmacy & Medical Boards 50 states | Healthcare | For your NPI or name list - license verification and disciplinary status across all 50 state boards | ~40k | $2,200 |
| State Attorney General Actions | Compliance | Daily sweep of all 50 AG sites - new enforcement actions, data-breach notifications, settlements | ~2k | $2,000 |
| MarineTraffic / VesselFinder | Supply Chain | For your vessel, IMO, or fleet list - positions, port calls, ETAs, voyage history | ~30k | $800 |
| State Procurement Portals 50 states | Gov Contracts | Daily sweep across all 50 state portals - new RFPs, contract awards, vendor registrations matching your keyword filters | ~40k | $2,600 |
| Port Authority Sites 50+ ports | Supply Chain | For the ports you select - container volumes, congestion metrics, vessel schedules, berth assignments | ~20k | $2,600 |
| GoodRx | Healthcare | For your drug + ZIP list - pharmacy-level prices, coupons, generic alternatives | ~15k | $550 |
| Carfax (public listings) | Automotive | For your VIN list - history summaries, accident indicators, service records | ~10k | $1,350 |
| Wayfair | Retail | For your SKU or category list - pricing, reviews, availability, sale status | ~15k | $1,350 |
| Instacart / FreshDirect | Grocery | For your SKU + ZIP list - grocery pricing and availability by store and retailer | ~25k | $1,350 |
| StockX / GOAT | Resale | For your product or SKU list - resale pricing, sales history, price premiums, size-level liquidity | ~20k | $2,100 |
| Shodan | Security | For your IP-range, org, or product-string set - open ports, service banners, vulnerabilities | ~40k | $800 |
| NFT Marketplaces (OpenSea / Blur) | Crypto | For your collection list - floor prices, volume, sales, rarity, holder concentration | ~30k | $1,600 |
| Glassnode (public) | Crypto | For your asset list - on-chain metrics, exchange flows, miner data, HODL waves | ~10k | $1,050 |
| Dow Jones / World-Compliance Watchlists | Compliance | For your entity list - PEP flags, adverse media, watchlist screening with change tracking | ~100k | $1,050 |
| State Insurance Rate Filings | Insurance | For your carrier or line-of-business filters - filed rates, forms, actuarial justifications, approval status | ~3k | $1,750 |
| FlightAware / FlightRadar24 | Aviation | For your tail-number, route, or operator list - live flight data, delays, airport performance, history | ~60k | $1,600 |
| DAT Freight & Analytics | Logistics | For your origin-destination lane list - spot rates, rate trends, capacity signals | ~8k | $1,050 |
| Blind (public threads) | Sentiment | For your company or topic list - thread content, sentiment, compensation signals, layoff chatter | ~15k | $1,050 |
No feeds match that search. Try a broader term, or email for a custom quote.
Record volumes and prices are averages - both scale with your input list size and how often you want the feed refreshed. A few feeds (e.g. custom careers pages) are priced per unit rather than a monthly flat. Don't see the source you need? The same infrastructure handles ~any public web source - email for a tailored quote.
Common questions
Why start with the Audit?
How fast can the Audit turn around?
Do you sign NDAs?
Will the Fractional role cover my engineering team's tickets?
Do you work with non-AI companies?
About me
Currently leading dark-web and threat-intelligence data collection at Recorded Future. Before that, a decade at ZoomInfo scaling web acquisition from startup to IPO - SERP engines processing 500M+ queries/month, crawler frameworks handling 1B+ pages/month, agentic extraction on LLMs and domain-specific SLMs, and multi-vendor integration across 600M+ people and 100M+ companies. I run Nodesnack on the side - the same infrastructure I use for client engagements. See resume and portfolio for context.
Get in touch
Most engagements start with a written intake for the Data Acquisition Audit. Email andrew@abharrismethods.com.
The button above pre-fills these in your email client. Answer what you can; partial intakes are fine.
-
1
YouName, role, company.
-
2
What you're buildingAnd why web data matters to it.
-
3
The acquisition problemWhat data you need, and what's broken or missing today.
-
4
Current setupNothing yet · DIY scrapers · vendor (which?) · inherited mess.
-
5
What it's costing you todayManual hours, vendor spend, or product capability you can't ship.
-
6
Scale & timingSources, volume, refresh cadence, when you'd want to start.
> replies within one business day