Built by Junaid Cheema

Designing and building the fully automated pipeline that takes a raw purchased database from URL to a prioritized, briefed call task, removing research, lead prioritization and task creation from the SDR’s day entirely.


→ Introduction

The client is a German digital therapeutics (DiGA) company. Growth depends on one thing: getting doctors to prescribe the app. The SDR team works through German orthopedic and musculoskeletal practices and converts them into prescribers.

The buyer here is the practice and its doctors. But the SDR’s daily reality isn’t the practice, it’s the CRM. And the bottleneck was never finding practices. There are tens of thousands of them. The bottleneck was that every lead arrived blank, and the only way to know which practice was worth a call, and what to say on it, was to research it by hand.

✕ Challenge

The CRM had been loaded years earlier from a purchased database. The records were unenriched and unreliable, so before an SDR could pick up the phone they had to become a research analyst.

For every lead, a rep opened the HubSpot company, clicked out to the practice website, and manually worked out:

All of that lived in the rep’s head. None of it went back into the CRM. It took five to twenty minutes per lead, the quality varied from rep to rep, and it set a hard ceiling on how many practices the team could ever reach.

The real problem wasn’t the calling. It was that the team spent the front half of every day doing research instead of selling.


✓ Solution

We built a fully automated pipeline that takes a raw HubSpot record from URL to a queued call task without an SDR touching it. The moment a record is flagged, it reads the practice website the way a rep would, extracts everything needed to qualify the lead, writes it back onto the HubSpot company and its contacts, then prioritizes the lead and creates the call task.

The enrichment engine runs on three components. Prioritization and task creation sit on top of it.

1. Site ingestion and classification. Pulls the homepage from the company URL, strips it to clean text, scores how modern and credible the site is on a 1 to 10 scale, and decides whether the practice is actually orthopedic. This is the first filter: is the lead even in scope.

2. AI extraction across the practice. Gemini 2.5 Pro reads the homepage, the services pages, the team and individual doctor pages, and the Impressum, each against a locked rubric. It pulls a practice summary, opening hours, the online-booking provider, email and fax, every doctor with their clinical focus, the lead MFA, the number of locations, and the languages spoken.

3. CRM write-back and contact graph. Every field writes back to the HubSpot company. Each doctor is matched against existing contacts with an AI dedup step, then created or updated and associated to the company, so the rep opens a record already populated with the right people and context instead of a blank shell.


★ The extraction model: what an SDR decides before they dial

The thinking behind the model matters more than the field list, so here is the logic first.

The design question was never “what can we scrape off the page.” It was “what does an SDR work out in their head before they call, and can we put each of those judgments onto the record automatically.” Every judgment falls into one of three buckets, and every extracted field maps to one of them.

The full field set, and what each one is really for: