Developer community and repository data
- GitHub, Docker Hub, and more sources
- Parsed, clean & accurate data
- Delivery in JSON
- Organization, projects, location and many other data points
- Valuable for investment, market research, and HR intelligence

4 total data sources
874M total data records
Always fresh, updated data
Top repository data sources

GitHub
GitHub data consists of over 866M records and provides you with data points such as the developer's username, location, hireability, projects' information, and more.

Docker Hub
Docker Hub data consists of over 2M records and provides you with data points such as the developer's name, location, company, repository information, and more.
Stay ahead of the game with fresh web data
Coresignal's data helps companies achieve their goals
Why do you need community and repository data?
Community and repository data allows you to find software projects and the best talent in IT who work with different developer languages, such as Python, Java, C++, and more. We offer data from coding, programming, web development, app development, software development communities, and more. Use this information to source IT talent or find investment opportunities by analyzing information about software projects.
The data is collected from public web sources such as GitHub, Docker Hub, Kaggle, and Stack Exchange.
Main data fields
Here are some examples of the data fields you will find in our community and repository data.
Information | Description | Example values* |
---|---|---|
company_name | Name of the company where the person is employed | X Software |
location | Highlights the location of the person | Netherlands |
repos_summary | Data block on repositories including names, descriptions, sizes, watcher counts, licenses, programming languages, etc. | NA |
scripts_summary | Data block on scripts by the user | NA |
datasets_summary | Data block on datasets by the user | NA |
Hire the top tech talent
Find the best IT specialists for your company by leveraging our community and repository data. Sourcing tech talent might be difficult in today's competitive environment. Save your time and resources by finding talent in developer communities.

Find fastest growing software companies
Once you find a talented and motivated developer, you can see what company they work for. Such data might indicate that the company has the potential for fast growth if the developer keeps their pace. In turn, it signals a potential investment opportunity. For a more in-depth view of the company, you could also opt for firmographic data.

Largest tech communities data
The data is gathered from the world's largest repositories/communities of developers, data scientists, programmers, and other IT professionals. In this dataset, you will be sure to find top tech talent, new software projects, and even discover new investment opportunities.

Community and repository data use cases
Data delivery
1
Tell us what
you need
First, we discuss your specific needs. Optionally, we can offer a sample dataset. Then, you can either request the full dataset or data specific to selected countries and regions.
2
Get the requested data
The requested data is then uploaded in CSV or JSON formats as a web link or a file, directly to your preferred data storage.
3
Keep it fresh
Outdated data loses relevance. With Coresignal, get monthly or quarterly data updates.
But don't take us at our word. Listen to our clients.
Venture capital client
"Coresignal has strong demographic and firmographic datasets both on quality and volume while keeping the data as fresh as it can be. We've been using Coresignal for years and we can only speak highly about the product and team behind it. Highly recommended."
Lead generation client
"We are using Coresignal to enrich our AI platform for Sales Pipeline Growth. We proactively recommend sales-ready opps, interested buyers, warm intros, and trusted actions, which results in +25% in net new pipeline in 2 months, and +40% after 6 months."
Venture capital client
"Before we started working with Coresignal, the percentage of investments that we made that had data influence was around 2% and currently it’s around 65%."
Why Coresignal?
In the market since 2016
Our team includes some of the most experienced web data extraction professionals.
Discover new and promising tech projects
In community and repository data you can find innovative projects that may redefine existing standards.
Follow open-source projects
If you discover a new open-source project that may bring you profit, you can buy and monetize it.
Find content gap from developer topics
Developers share a lot of information in the communities that may help you discover certain content gaps.
Monitor software project growth
Regularly check the developers that caught your interest and see how their projects progress over time.
Track trending topics among developers
See what developers are talking about and what topics or projects seem to be gaining traction.
Stay ahead of the game with fresh web data
Coresignal's data helps companies achieve their goals
Frequently asked questions
What is a developer community?
A developer community is a place where developers share their projects, knowledge, progress, and advice, among other things.
What are Coresignal's developer community and repository data sources?
Coresignals developer community and repository data sources include GitHub, Docker Hub, Kaggle, and Stack Exchange.
Where to find tech talent?
You can find tech talent in community and repository data or employee data.
How is community and repository data collected?
We collect community and repository data from various public web sources and put it into several databases. Different data sources have separate datasets of respective community and repository data records.
Who uses community and repository data?
Coresignal’s community and repository data is being used by investors and HR platforms that use it to generate investment signals and source talent.
How secure is the data?
Data security is one of the main priorities. We store data in a protected dataset to avoid breaches and leaks of sensitive information.