LEAD GEN
Wellfound Startup Database
Startup profiles with funding rounds, team size, tech stack, and hiring status. Built for a VC firm tracking early-stage companies across specific verticals.
The challenge.
A seed-stage VC firm needed to systematically track early-stage startups across fintech, healthtech, and climate verticals. Wellfound (formerly AngelList) has the best startup data, but no bulk export and aggressive rate limiting. The firm's analysts were manually reviewing profiles one by one.
The approach.
Scrapy Crawl Architecture
Built a Scrapy spider with depth-first crawling through Wellfound's category and location filters. Used rotating datacenter proxies with per-request delays calibrated to stay under rate limit thresholds.
Funding & Team Extraction
Parsed startup profiles for founding date, funding stage, total raised, last round size, team headcount, key hires, and active job postings. Captured investor names and board members when listed.
Tech Stack & Signal Analysis
Extracted listed tech stacks and job posting requirements to infer actual technology usage. Built a scoring model that flags startups showing growth signals: rapid hiring, new funding, tech stack expansion.
VC-Ready Database
Stored everything in PostgreSQL with weekly diff reports highlighting new startups, funding events, and team changes. Built custom views for each target vertical with configurable alert thresholds.
Sample output.
{
"company_name": "Optera Climate",
"vertical": "Climate Tech",
"founded_year": 2023,
"funding_stage": "Seed",
"total_raised_usd": 3200000,
"team_size": 18
}The results.
Startups tracked
Target verticals
Funding events captured
Investments sourced from data
Tech stack.
Ready to get your data?
Book a 30-minute call and I’ll scope your project live. No commitment required.
Or reach out directly: