Building a Data Team to Work with Web Data

The topic of how to assemble an efficient data team is a highly debated and frequently discussed question among data experts. If you're planning to build a data-driven product or improve your existing business with the help of public web data, you will need data specialists.

In this article, I will cover some key principles I have observed throughout my experience working in the public web data industry that may help you build an efficient data team.

Although we have yet to find a universal recipe for that, the good news is that there are various ways to approach this subject and still get the desired results. Here, I will explore the process of building a data team from the perspective of business leaders who are just starting with public web data.

What is a data team?

A data team is responsible for collecting, processing, and providing data to stakeholders in the format needed for business processes. This team can be incorporated into a different department, such as the marketing department, or be a separate entity.

The term data team can describe a team of any size, from one to two specialists to an extensive multilevel team managing and executing all aspects of data-related activities at the company.

Common roles in data teams may include data scientists, engineers, analysts, and managers. Later, I will discuss each in more detail.

7 questions to ask yourself before building a data team

Just like any business solution, data team creation has to be a well-thought-out process. From my experience, some managers want one because they read that everyone is into big data today. Unfortunately, they do not know what to do after hiring data specialists.

To avoid unpleasant surprises and, most importantly, financial repercussions, every company CEO should ask themselves the following questions:

What is your product? Is there a way it can benefit from public web data?
What data will you be using? There are many datasets to choose from, and you certainly don't need all of them. While most businesses seek company and employee data, others see more value in job posting datasets or technographic information. Therefore, you should check what data fields are available in each dataset and how many records are in total before committing.
What are the critical components of the product that involve data? This question helps you filter unwanted data and get just the relevant one, which saves your resources. If you sell B2B software, knowing the company's tech stack might be enough to come up with a lead list.
What are the expected results from different project stages involving data? Before jumping to lead generation, you may want to do thorough market research using data. This way, you may find opportunities that your competitors have overlooked. In some instances, it can even lead to account-based selling.
What tech stack will be required for that? That's a job for your CTO, provided you have one. If not, you may want to let your data team lead choose. Of course, I advise checking the most popular data software and its prices to see how it fits the budget.
Who are the stakeholders? Is it you or the Board? Convincing yourself that your plan will work may be easier than a group of people who look at big data with big doubts.
What indicators will help you evaluate if your current data team meets your business needs? Last but not least, you need a way to quantify your data team's success. This can range from time to insight to the number of integrated data sources.

A guide to building your data team

When you have all the answers to the questions above, it's time to start building your data team.

I recommend that businesses working with public web data follow a straightforward principle: an efficient data team is one that works in alignment with your business needs. It all starts with what product you will be building and what data will be needed for that.

Simply put, every company planning to start working with web data needs specialists who can ingest and process large amounts of data and those who can transform data into information valuable for the business.

Transformation stage

Usually, the transformation stage is where the data starts to create value for its downstream users.

A small business can even start with one specialist to get to this stage. The first hire can be a data engineer with analytical skills or a data analyst with experience working with big data and light data engineering. When you're building something more complex, it's important to understand that public web data is essentially used to get answers to business questions, and web data processing is all about iterations.

From data to information

No matter the complexity of your product, you always start with acquiring a large amount of data. Further iterations may include aggregated data or enriching your data with data from additional sources. Then, you process it to get information, like specific insights. As a result, you get information that can be used in processes that follow, for example, supporting business decision-making, building a new platform, or providing insights to clients.

3 ways to build your data team

From a product perspective, the answer to what data team you need is connected to the tools you will be using, which also depends on the volumes of data you will be using and how it will be transformed. From this perspective, I can split building a data team into three scenarios:

Scenario 1. You work with semi-automated or fully automated tools that don't require customization and specific skills. Junior-level data specialists may even handle some tasks.
Scenario 2. Some operations or data transformation processes require development work outside of the tools you're using.
Scenario 3. You cannot use the above-mentioned options because your product requires full customization. In this case, you could use open-source software and build everything from scratch based on your exact product needs.

Ultimately, the size of your data team and the specialists you need depend on your product and vision. Our experience building Coresignal's data team taught us that the key principle is to match the team's capabilities with product needs, regardless of the seniority level of the specialists.

How many data team roles are there?

The short answer is, "It depends." When it comes to classifying data roles, there are many ways to approach this question. New roles emerge, and the lines between existing ones sometimes overlap.

Let's cover the most common roles in teams working with public web data. In my experience, the structure of data teams is tied to the process of working with web data, which consists of the following components:

Getting data from the source system
Data Engineering
Data Analytics
Data Science

In her article published in 2017, Monica Rogati, a well-known data scientist, introduced the concept of the hierarchy of data science needs in an organization. The hierarchy shows that most data science-related needs in an organization are related to the parts of the process at the bottom of the pyramid—collecting, moving, storing, exploring, and transforming the data.

These tasks also create a solid data foundation in an organization. The top layers include analytics, machine learning (ML), and artificial intelligence (AI).

However, all these layers are important in an organization working with web data and require specialists with a specific skill set.

Data engineers

Data engineers are responsible for managing the development, implementation, and maintenance of the processes and tools used for raw data inges. Their goal is to produce information for downstream use, such as analysis or machine learning (ML).

When hiring data engineers, overall experience working with web data and specialization in working with specific tools is usually at the top of the priority list. You need a data engineer in scenarios 2 and 3 mentioned above and in scenario 1 if you decide to start with one specialist.

Data (or business) analysts

Data analysts mostly focus on existing data to evaluate business performance and to provide insights for improving it. You already need data analysts in scenarios 1 and 2 mentioned above.

The most common skills companies seek when hiring data analysts are SQL, Python, and other programming languages (depending on the tools used).

Data scientists

Data scientists are primarily responsible for advanced analytics that are focused on making future predictions or insights. Analytics are considered "advanced" if you use them to build data models. For example, if you plan to have machine learning or natural language processing operations.

Let's say you want to work with data about companies by analyzing their public profiles. You want to identify the percentage of the fake business profiles in your database. Through multiple multi-layer iterations, you want to create a mathematical model that will allow you to identify the likelihood of a fake profile and categorize the profiles you're analyzing based on specific criteria. For such use cases, companies often rely on data scientists.

Essential skills for a data scientist are mathematics and statistics, which are needed for building data models, and programming skills (Python, R) for data scientists in scenario 3 mentioned above.

Analytics engineer

This relatively new role is becoming increasingly popular, especially among companies working with public web data. As the title suggests, an analytics engineer role is between an analyst focusing on analytics and a data engineer focusing on infrastructure.

Analytics engineers are responsible for preparing ready-to-use datasets for data analysis, which is usually performed by data analysts or data scientists, and making sure that the data is ready for analysis in a timely manner.

SQL, Python, and experience with tools needed to extract, transform, and load data are among the essential skills for analytics engineers. An analytics engineer would be useful in scenarios 2 and 3 mentioned above.

3 things to keep in mind when assembling a data team

As there are many different approaches to the classification of data roles, there's also a variety of frameworks that can help you assemble and grow your data team. Let's simplify it for an easy start and say that there are different lenses through which a business can evaluate what team will be needed to get started with web data.

Data lens

The web data that I'm referring to in this article is big data. It consists of large amounts of data records, which are usually delivered to you in large files and raw format. It would be best to have data specialists who have experience working with large data volumes and the tools used for processing them.

Tech stack lens

When it comes to tools, you should consider the fact that tools that your organization will use for handling specific types of data will also shape what specialists you will need. If you need to become more familiar with the required tools, consult an expert before hiring a data team or hire professionals who can help you select the right tools depending on your business needs.

Organizational lens

You may also start building a data team by evaluating which stakeholders the data specialists will work closely with and deciding how this new team will fit into your vision of your organizational structure. For example, will the data team be a part of the engineering team? Will this team mainly focus on the product? Or will it be a separate entity in the organization?

Organizations that have a more advanced data maturity level and are building a product powered by data will look at this task through a more complex lens. This involves the company's future vision, aligning on the definition of data across the organization, deciding who and how will manage it, and determining how the overall data infrastructure will look as the business grows.

What makes a data team efficient?

The data team is considered efficient as long as it meets your business's needs, and in almost every case, the currency of data team efficiency is time and money.

You can rely on metrics like the amount of data processed during a specific time or the amount of money you spend. As long as you track this metric at regular intervals, the next thing you want to watch is the dynamics of these metrics. Simply put, if your team is processing more data with the same amount of money, it means the team is becoming more efficient.

Another efficiency indicator that combines the aforementioned is how well your team is writing code because, ultimately, you can have a lot of resources and perform iterations quickly, but errors equal more resources spent.

Besides the metrics that are easy to track, one of the most common problems that companies experience is trust in data. Trust in data is precisely what it sounds like. Although there is a way to track the time it takes to perform data-related tasks or see how much it costs, stakeholders may still question the reliability of these metrics and the data itself. This trust can be negatively impacted by experiences like previous incidents or the lack of communication and information from data owners.

What's more, working with large volumes of data means that spotting errors is a complex task. Still, the organization should be able to trust the quality of the data it uses and the insights it produces using this data.

It is useful to perform statistical tests, allowing the data team to evaluate the quantitative metrics related to data quality, such as fill rates. By doing this, the organization can also accumulate historical data that will allow the data team to spot issues or negative trends in time. Another essential principle to apply in your organization is listening to client feedback regarding the quality of your data.

To sum up, it all comes down to having talented specialists on your data team who can work quickly and precisely and build trust in their work.

Conclusion

I hope this article helped you gain a better understanding of different data roles that are common in organizations working with public web data, why they are important, which metrics help companies measure the success of their data teams, and finally, how it is all connected to the way your organization thinks about the role of data.

This article was originally published on Readwrite.

How to Build an Efficient Data Team in 2024

What is a data team?

7 questions to ask yourself before building a data team