This article explores a practical AI use case grounded in insights from real-world implementations, conversations with B2B teams, and experience working with large-scale data.
The goal is to share how modern sales organizations are evolving their ICP strategies using public web data — and how tools like Coresignal can support that shift with reliable company, employee, and jobs data.
What’s built
A self-optimizing AI engine that continuously refines your Ideal Customer Profile (ICP) based on real-world sales performance from your CRM. It evolves as your GTM strategy shifts — identifying lookalike prospects, surfacing unexpected verticals, and minimizing wasted outreach.
What B2B data is used
- Firmographics and technographics: Company size, revenue, industry, tech stack (e.g., uses Salesforce + HubSpot + Snowflake).
- Hiring signals: Open roles that indicate growth stage or operational bottlenecks (e.g., hiring Revenue Ops → prepping for scale).
- Employee graphs: Team composition scraped from public profiles — who works there and in what roles.
Historical CRM outcomes: Closed-won/lost outcomes and pipeline progression feed back into the model for ongoing learning.
How to implement the AI-based ICP engine using Coresignal?
Implementing this system requires combining your internal CRM data with rich external data and signals. Below is a step-by-step breakdown of how Coresignal’s datasets can be used to power each stage of the pipeline.
1. Data collection (via Coresignal APIs or datasets)
Company data:
- Source from Coresignal’s Company dataset.
Extract: industry, size, location, founding date, employee count, funding info, tech tags. - Useful for defining firmographic profiles and filtering segments.
Job postings:
- Use Coresignal’s Jobs dataset.
- Extract: job title, department, seniority, skills required, number of postings per month.
- Key for detecting intent signals (e.g., scaling sales team = ICP match).
Employee data:
- Pull from the Employee dataset.
- Extract: title graph, job history, location, tenure, education.
- Helps model internal org structure, seniority mix, skill focus.
- Build relationship graphs and prevent data silos
These datasets can be ingested into a unified data warehouse or processing layer, where entities are matched by domain, professional network URL, or other identifiers. This creates a consolidated view of each company enriched with up-to-date hiring, org structure, and role-level intelligence, forming the foundation for downstream modeling.
2. Model training: closed-won pattern recognition
The goal of this step is to learn what types of companies convert most successfully – and to use those insights to surface similar accounts going forward.
- Join your CRM data (closed-won/lost, lead score, opportunity stage) with Coresignal firmographic/enrichment data using domain/professional network ID as a unique identifier.
- Engineer features:
- Team composition (e.g., % GTM vs. tech roles).
- Hiring velocity (e.g., 10+ GTM roles added in 2 months).
- Tech stack proxies (e.g., job mentions: Snowflake, Salesforce).
- Train a classification model (e.g., XGBoost, CatBoost) or a Siamese network for similarity scoring.
This model becomes the backbone of your adaptive ICP engine – continuously refined by new outcomes in the sales cycle.
3. Real-time ICP scoring engine
- On a scheduled basis (weekly/daily), pull new company/job/employee data from Coresignal.
- Apply your trained model to rank companies by ICP fit score.
- Integrate the results into:
- CRM (e.g., via Salesforce custom objects).
- Outreach tools (Apollo, Salesloft).
- A lightweight internal dashboard.
Once the scoring engine is deployed, its value depends on how well it integrates into day-to-day sales workflows. Scores alone aren’t enough – they need to be contextually surfaced in the tools reps already use, and dynamically updated based on real engagement. The next step is to close the loop by capturing how the model performs in practice, enabling it to learn from both successful and failed outreach.
4. Continuous feedback loop
As SDRs and AEs interact with prospects, their actions become implicit feedback for the model. Each record – whether a lead, contact, or account – moves through defined states that help validate or challenge your ICP assumptions.
Sales feedback signals to capture:
- Manual adjustments (e.g., marking “not ICP” when the firm doesn’t match sales criteria).
- Engagement milestones (e.g., email opens, replies, demo booked).
- Outcome data (e.g., SQL created, opportunity won/lost, closed reason).
All of this data can be exported and joined with your modeling environment for retraining on a monthly or quarterly cadence. For higher-frequency learning, you can implement online learning techniques or reinforcement learning where scores adjust based on downstream outcomes.
Tools/stack you might use
Below are some ideas for how to structure the stack for this process.
- Data Storage: Snowflake or Postgres for joined datasets.
- Modeling: Python (scikit-learn, XGBoost, or PyTorch if going deep).
- Serving: FastAPI or Flask microservice.
- Orchestration: Airflow for pulling + updating data.
Visualization: Streamlit or a React dashboard embedded in CRM.