August 30, 2023
The topic of how to assemble an efficient data team is a highly debated and frequently discussed question among data experts. If you're planning to build a data-driven product or improve your existing business with the help of public web data, you will need data specialists.
In this article, I will cover some key principles I have observed throughout my experience working in the public web data industry that may help you build an efficient data team.
Although we have yet to find a universal recipe for that, the good news is that there are various ways to approach this subject and still get the desired results. Here I will explore the process of building a data team through the perspective of business leaders who are just getting started with public web data.
What is a data team?
A data team is responsible for collecting, processing, and providing data to stakeholders in the format needed for business processes. This team can be incorporated into a different department, such as the marketing department, or be a separate entity in the company.
The term data team can be used to describe a team of any size, from one to two specialists to an extensive multilevel team managing and executing all aspects of data-related activities at the company.
Where to start?
There's a straightforward principle that I recommend businesses working with public web data to follow: an efficient data team is one that works in alignment with your business needs. It all starts with what product you will be building and what data will be needed for that.
Simply put, every company planning to start working with web data needs specialists who can ingest and process large amounts of data and those who can transform data into information valuable for the business. Usually, the transformation stage is where the data starts to create value for its downstream users.
To get to this stage, a small business can even start with one specialist. The first hire can be a data engineer with analytical skills or a data analyst with experience working with big data and light data engineering. When you're building something more complex, it's important to understand that public web data is essentially used for getting answers to business questions, and web data processing is all about iterations.
No matter the complexity of your product, you always start with acquiring a large amount of data. Further iterations may include aggregated data or enriching your data with data from additional sources. Then, you process it to get information, like specific insights. As a result, you get information that can be used in processes that follow, for example, supporting business decision-making, building a new platform, or providing insights to clients.
Looking from a product perspective, the answer to what data team you need is connected to the tools you will be using, which also depends on the volumes of data you're going to be using and how it will be transformed. From this perspective, I can split building a data team into three scenarios:
- Scenario 1. You work with semi-automated or fully automated tools that don't require customization and specific skills. Some tasks may even be handled by junior-level data specialists.
- Scenario 2. Some operations or data transformation processes require development work outside of the tools you're using.
- Scenario 3. You are not able to use the above-mentioned options because your product requires full customization. In this case, you could use open-source software and build everything from scratch based on your exact product needs.
Ultimately, the size of your data team and what specialists you need depend on your product and vision for it. Our experience building Coresignal's data team taught us that the key principle is to match the capabilities of the team with product needs, in spite of the seniority level of the specialists.
How many data roles are there?
The short answer to this question is "It depends." When it comes to the classification of data roles, there are many ways to look at this question. New roles emerge, and sometimes, the lines between existing ones may overlap.
Let's cover the most common roles in teams working with public web data. In my experience, the structure of data teams is tied to the process of working with web data, which consists of the following components:
- Getting data from the source system;
- Data engineering;
- Data analytics;
- Data science.
In her article published in 2017, a well-known data scientist Monica Rogati introduced the concept of the hierarchy of data science needs in an organization. It shows that most data science-related needs in an organization are related to the parts of the process at the bottom of the pyramid - collecting, moving, storing, exploring, and transforming the data.
These tasks are also something that makes a solid data foundation in an organization. The top layers consist of analytics, machine learning (ML), and artificial intelligence (AI).
However, all these layers are important in an organization working with web data, and they all require specialists with a specific skill set.
Data engineers are responsible for managing the development, implementation, and maintenance of the processes and tools used for raw data ingestion with the goal of producing information for downstream use, for example, analysis or machine learning (ML).
When hiring data engineers, overall experience working with web data and specialization in working with specific tools is usually at the top of the priority list. You need a data engineer in scenarios 2 and 3 mentioned above and in scenario 1, if you decide to start with one specialist.
Data (or business) analysts
Data analysts mostly focus on data that already exists in order to evaluate how a business is performing and to provide insights for improving it. You already need data analysts in scenarios 1 and 2 mentioned above.
The most common skills that companies are looking for when hiring data analysts are SQL, Python, and other programming languages (depending on the tools that will be used).
Data scientists are primarily responsible for advanced analytics that are focused on making future predictions or insights. Analytics are considered "advanced" if you use them to build data models. For example, if you will have machine learning or natural language processing operations.
Let's say you want to work with data about companies by analyzing their public profiles. You want to identify the percentage of the business profiles in your database that are fake. Through multiple multi-layer iterations, you want to create a mathematical model that will allow you to identify the likelihood of a fake profile and categorize the profiles you're analyzing based on specific criteria. For such use cases, companies often rely on data scientists.
Essential skills for a data scientist are mathematics and statistics, which are needed for building data models, and programming skills (Python, R). It's very likely that you will need to have data scientists in scenario 3 mentioned above.
This is a relatively new role that is becoming increasingly more popular, especially among companies working with public web data. As the title suggests, the role of an analytics engineer role is between an analyst who focuses on analytics and a data engineer who focuses on infrastructure. Analytics engineers are responsible for preparing ready-to-use datasets for data analysis, which is usually performed by data analysts or data scientists, and making sure that the data is prepared for analysis in a timely manner.
SQL, Python, and experience with tools needed to extract, transform, and load data are among the essential skills required for analytics engineers. Having an analytics engineer would be useful in scenarios 2 and 3 mentioned above.
3 things to keep in mind when assembling a data team
As there are many different approaches to the classification of data roles, there's also a variety of frameworks that can help you assemble and grow your data team. Let's try to simplify it for an easy start and say that there are different lenses through which a business can evaluate what team will be needed to get started with web data.
The web data that I'm referring to in this article is big data. Large amounts of data records, which are usually delivered to you in large files and raw format. It would be best to have data specialists who have experience working with large data volumes and the tools that are used for processing it.
Tech stack lens
When it comes to tools, you should consider the fact that tools that your organization will use for handling specific types of data will also shape what specialists you will need. If you need to become more familiar with the required tools, consult an expert before hiring a data team or hire professionals who can help you select the right tools depending on your business needs.
You may also start building a data team by evaluating which stakeholders the data specialists will work closely with and deciding how this new team will fit into your vision of your organizational structure. For example, will the data team be a part of the engineering team? Will this team mainly focus on the product? Or will it be a separate entity in the organization?
Organizations that have a more advanced data maturity level and are building a product that is powered by data will look at this task through a more complex lens, which involves the company's future vision, aligning on the definition of data across the organization, deciding on who and how will manage it, and how the overall data infrastructure will look as the business grows.
What makes a data team efficient?
The data team is considered efficient as long as it meets the needs of your business, and almost in every case, the currency of data team efficiency is time and money.
So, you can rely on metrics like the amount of data processed during a specific time or the amount of money you spend. As long as you track this metric at regular intervals, the next thing you want to watch is the dynamics of these metrics. Simply put, if your team is managing to process more data with the same amount of money, it means the team is becoming more efficient.
Another efficiency indicator that combines the aforementioned is how well your team is writing code because, ultimately, you can have a lot of resources and perform iterations quickly, but errors equal more resources spent.
Besides the metrics that are easy to track, one of the most common problems that companies experience is trust in data. Trust in data is exactly what it sounds like. Although there is a way to track the time it takes to perform data-related tasks or see how much it costs, stakeholders may still question the reliability of these metrics and the data itself. This trust can be negatively impacted by negative experiences like previous incidents or simply the lack of communication and information from data owners.
What's more, working with large volumes of data means that spotting errors is a complex task. Still, the organization should be able to trust the quality of the data it uses and the insights it produces using this data.
It is useful to perform statistical tests allowing the data team to evaluate the quantitative metrics related to data quality, such as fill rates. By doing this, the organization can also accumulate historical data that will allow the data team to spot issues or negative trends in time. Another essential principle to apply in your organization is listening to client feedback regarding the quality of your data.
To sum up, it all comes down to having talented specialists in your data team who can work quickly, with precision, and build trust around the work they are doing.
To sum everything up, here are helpful questions to help you assemble a data team:
- What is your product?
- What data will you be using?
- What are the key components of the product that involve data?
- What are the results expected from different project stages involving data?
- What tech stack will be required for that?
- Who are the stakeholders?
- What indicators will help you evaluate if your current data team meets your business needs?
I hope this article helped you gain a better understanding of different data roles that are common in organizations working with public web data, why they are important, which metrics help companies measure the success of their data teams, and finally, how it is all connected to the way your organization thinks about the role of data.
This article was originally published on Readwrite.
Don’t miss a thing
Subscribe to our monthly newsletter to learn how you can grow your business with public web data.
Things You Should Know When Scaling Your Web Data-Driven Product
Scaling is an important part of growing your business. Here are some things you should know about...
October 02, 2023
Data Life Cycle: Manage Your Data to Ensure a Successful Business
Knowing the stages of life which your data goes through is imperative to keeping it protected and unbreached at every step of the...
October 02, 2023
Company benchmarking: Compare companies to make informed decisions
Let's examine how benchmarking helps businesses improve their current procedures, invest their assets better, and create...
October 02, 2023