Skip to content
← All work
Dataset

Professional Profiles Extraction

Directory pages turned into searchable professional records.

Exhibit

Source → structured output

Public professional directory pages

Before · source

Source page for Public professional directory pages

After · sample records

[
  {
    "name": "Marty DiMarzio",
    "email": "mdimarzio@deloitte.com",
    "title": "Global Account Leader & Board Member| Life Sciences",
    "telephone": "+1 617 437 3730",
    "linkedIn": "https://www.linkedin.com/pub/marty-dimarzio/0/287/11b",
    "description": "Marty is a leader in Deloitte Consulting LLP, with a track record of architecting and delivering complex global transformations for some of our largest life sciences clients. He serves as the Global Lead Client Service Partner for a Fortune 500 client.",
    "long_description": "Marty is a leader in Deloitte Consulting LLP, with a track record of architecting and delivering complex global transformations for some of our largest life sciences clients. He serves as the Global Lead Client Service Partner for a Fortune 500 client. Marty has spent his entire career with Deloitte, working closely with Fortune 500 clients on a range of strategy, business transformation, acquisition and divestiture, and systems implementation projects. He is well known for developing strong relationships across Deloitte’s network globally to provide a full breadth of capabilities and delivery to clients. In addition to his leadership servi
…
Explore live sample →

Pipeline log

  • TargetProfile templates varied by practice group.
  • ParseEmail and phone validated with format checks.
  • ShipShipped explorer-backed samples for due diligence review.

The problem

Professional directory listings buried contact details, titles, and LinkedIn references in inconsistent HTML. The client needed a clean, queryable dataset for outreach and market mapping.

Approach

  1. 01Identified stable DOM selectors and fallback parsing rules per profile template.
  2. 02Extracted name, email, title, phone, and social links with validation passes.
  3. 03Staged data in PostgreSQL for column-level exploration via the Data Explorer.
  4. 04Published anonymized samples in the Lab for review.

Deliverables

  • Cleaned profile records with normalized contact fields
  • Interactive explorer for column selection and CSV sample export
  • Documented schema for downstream CRM import

More work