← All workDataset 
Professional Profiles Extraction
Directory pages turned into searchable professional records.
Exhibit
Source → structured output
Public professional directory pages
Before · source

After · sample records
[
{
"name": "Marty DiMarzio",
"email": "mdimarzio@deloitte.com",
"title": "Global Account Leader & Board Member| Life Sciences",
"telephone": "+1 617 437 3730",
"linkedIn": "https://www.linkedin.com/pub/marty-dimarzio/0/287/11b",
"description": "Marty is a leader in Deloitte Consulting LLP, with a track record of architecting and delivering complex global transformations for some of our largest life sciences clients. He serves as the Global Lead Client Service Partner for a Fortune 500 client.",
"long_description": "Marty is a leader in Deloitte Consulting LLP, with a track record of architecting and delivering complex global transformations for some of our largest life sciences clients. He serves as the Global Lead Client Service Partner for a Fortune 500 client. Marty has spent his entire career with Deloitte, working closely with Fortune 500 clients on a range of strategy, business transformation, acquisition and divestiture, and systems implementation projects. He is well known for developing strong relationships across Deloitte’s network globally to provide a full breadth of capabilities and delivery to clients. In addition to his leadership servi
…Explore live sample →Pipeline log
- TargetProfile templates varied by practice group.
- ParseEmail and phone validated with format checks.
- ShipShipped explorer-backed samples for due diligence review.
The problem
Professional directory listings buried contact details, titles, and LinkedIn references in inconsistent HTML. The client needed a clean, queryable dataset for outreach and market mapping.
Approach
- 01Identified stable DOM selectors and fallback parsing rules per profile template.
- 02Extracted name, email, title, phone, and social links with validation passes.
- 03Staged data in PostgreSQL for column-level exploration via the Data Explorer.
- 04Published anonymized samples in the Lab for review.
Deliverables
- ▸Cleaned profile records with normalized contact fields
- ▸Interactive explorer for column selection and CSV sample export
- ▸Documented schema for downstream CRM import