Back to blog

Data Curation: Benefits, Goals, and Best Practices

Coresignal

Updated on Sep 16, 2021
Published on Sep 16, 2021
piece of spherical architecture representing data curation

Businesses and other organizations store incredible volumes of data, and there is generally more access to information today than ever before. But the more data you have, the harder it may be to actually use it. Thus, there’s this paradox that with greater access to information comes less knowledge. So, what can be done in these paradoxical times? We need to take care of our data by reviewing and managing it carefully. Among the most important procedures for that, we have data curation. The utility of this process when managing large pools of data has been observed in many fields. And for business, this means a chance to salvage the value in data that may otherwise be overlooked.

What is data curation?

Data curation is generally defined as the active and ongoing management of data throughout its lifecycle. Data lifecycle lasts for as long as it is of any interest to analysts and researchers, which is to say, for as long as it can be used and reused for added value. As curation is the process of preserving, taking care of, and presenting physical objects in a particular way, so is data curation but for digital objects.

Others may define data curation by referring to aspects of discovering, integrating, and duplicating data compounded from diverse sources. Such and other procedures are in fact often included or supported by data curation, as it aims for data reuse and novel presentation so that additional value can manifest. This means that curation is also a metadata management process aiming to recognize specific kinds of data in various databases.

Data curation best practices and methods

Originally data curation came from the management of scientific data. Scientists can reuse many sources of published data and find additional interest and usefulness for it in new research contexts. However, to make sure that scientists find what they need in the enormous oceans of published data there’s a need for a research data management procedure that would help to organize data by its features and points of interest. That’s how various practices aiming to advance this purpose started to develop and unite under the term data curation.

Soon the procedures spread from research data management to various other fields, where they were developed further by data scientists, data analysts, and users. Data curation activities may vary depending on the different areas in which they are employed. However, below are some of the practices usually involved in data curation.

  • Preserving data generally means collecting, storing, and managing data to ensure that it doesn’t get lost.
  • Data discovery. This is the procedure of gathering data from different databases, cataloging, categorizing, and otherwise preparing it for further usage and analysis.
  • Cleaning data to remove errors and inconsistencies.
  • Data integration, which may involve data normalization and data transformation in order to integrate data from differently formatted databases.
  • Data sharing, that is making data available for further use by interested parties.
  • Metadata management, which is managing and creating data about the curated data, will make it easier for users to filter the information and find the relevant data points.

Data curation is also all about the presentation of data in a way that is most informative for the intended audience. Thus, data curators employ dashboards, charts, and various other tools and techniques to show the potential implications of the curated data.

Roles and responsibilities of data curators

Data curators are the individuals within an organization tasked with the aforementioned practices. Much like museum or gallery curators, they choose how to handle and present particular objects, in this case, sets of data, in a coherent way.

First of all, anyone working with information can help to curate data. There may be varying kinds of data curators with different tasks and degrees of responsibility. Such curators can help with collaborative curation, bringing together data from diverse data sources. Collaborative curators usually have many data-related tasks thus are not primarily responsible for curation.

Domain curators have a higher degree of responsibility as they work with the entire domain of information. They are responsible for recording and sharing domain knowledge of a specific data domain, for example, product or finance data.

At the highest level of responsibility, we have lead curators. Typically, an organization will opt to have one lead data curator overseeing all the curation practices. Lead curators are responsible for the content and quality of curated data and cataloged metadata.

Data curators are sometimes mixed with data stewards as both are metadata management officers with somewhat overlapping practices. But where data stewards are more concerned with improving data-related practices for direct business value, data curators are more about tracking, evaluating, and presenting information about data quality and data handling within the organization.

Primary goals and importance of data curation

Big data has transformed every area that deals with data and its management. The business and finance industry has felt the issues arising from information overload just as well as it has felt the benefits of big data management. Data curation is crucial in dealing with this overload in an efficient way to make sure that data transforms into value. Its main goals can be specified as follows.

1. Managing data throughout its lifecycle

This means that as long as there is any potential interest and usefulness in a data set, data curators will be watching over it. Active and ongoing management of data ensures that data is kept at a high standard of quality and used properly. Hence the many procedures encompassing data curation – whatever is done with data, curators need to be aware of it and record the metadata when appropriate.


person using tablet to show funnel of a process

2. Dealing with data swamps

Many organizations store data even when they are not sure of its utility and currently have no future prospects for it. This data is usually unstructured and stored together in what is known as data lakes. But when there is no data governance, no categorization or plans for useful data, data lakes turn into data swamps. In these swamps, any remaining valuable data may be lost forever.

The data curation process helps to turn data swamps back into data lakes. After curation, data becomes more organized, with metadata added to categorize it, thus making useful datasets more easily discoverable.

3. Organizing and reusing valuable information

Data discovery and other data curation activities can proceed once the data storage is no longer a swamp but rather a lake. The next goal is to salvage the still valuable data from perishing out of usage and into digital obsolescence. For this, data curators utilize such procedures as data wrangling, that help to organize, transform, and clean the raw data in preparation for further utilization. Then the newly curated data is ready to be presented for those looking for valuable information in regard to particular questions they have at hand. 

4. Supporting other data management procedures

As mentioned above, data curation is somewhat related to data stewardship, as these procedures support one another. In addition, both of these processes support the goals of data governance which is the exercise of control over an organization’s data assets. Data governance includes strategic planning, monitoring, and creating policies for data management activities. Thus, data curation comes in handy when data governance tasks need to be carried out, as the form diverse is easier to find and track.

Business benefits of data curation

Ensured data quality

All the procedures involved in data curation ensure that the data owned by the organization is kept at a high level of quality. Curated data is the data that has been cleaned, made free of redundancies, and organized so that it is easy to spot and correct leftover errors. Additionally, data is made more readable, which is also an important sign of quality. Clearly, the quality and usability of data owned by a business are directly related to how much of an asset it can become when furthering business goals.

Added value

Data curation also directly adds business value by processing data assets. In this sense, it’s much like fixing a broken or badly working tool. It may still have some value before fixing, but not as much as afterward. By fixing and newly combining data, curation brings new value to old assets. Furthermore, the metadata management and presentation of data saves time and effort, which can be redirected to where it could be used more beneficially.

Efficient usage of data assets

Without letting data that still have potential disappear into obscurity, data curation ensures that information assets are used efficiently. This means no waste taking up the storage space, as everything in there can be brought back to life by curation. And due to the categorization of the data and well-organized metadata brought about by curation, data can be used efficiently in regard to both price and speed.

Improved machine learning practices

Machine learning is a crucial procedure in business and finance today, as well-trained algorithms can drastically improve the efficiency and results of many practices. But to train algorithms, one needs high-quality and well-prepared data. Data curation makes a lot of information that is stored by the business usable for the training of algorithms. Furthermore, when people in charge of training can better read and evaluate the data they’re using, it’s much easier for them to track the training process and make the right choices.

Boosted innovation

Data curation tends to boost innovation in data handling practices. Due to its collaborative nature, data curation promotes the interchanging of ideas about data and the management of data. Furthermore, with constant tracking of all the processes, issues can be raised more freely as well as possible solutions and suggestions on how to fix them. All this leads to innovative data handling solutions and makes the business environment more data-savvy.

More accessible knowledge

Finally, data curation is all about the presentation of particular information for the people that have the skills to do something useful with it. By discovering and organizing data sets, data curators make the knowledge within them accessible for all other professionals within the organization. And presenting the right kind of knowledge to the right kind of people has nearly unlimited possibilities of advancing business goals.


library filled with data and information

The future of big data and public web data curation

Since the rise of big data, there could no longer be any question regarding the value of data in business. What remains to know is how this relationship between business and data analytics is going to develop in the future.

There’s good reason to believe that such data handling techniques as data curation will rise in its prominence. A study headed by the University of Texas has shown that if fortune 1000 companies were to raise the usability of their data by 10%, it would mean a $2.01 billion increase in total revenue per year. Given such benefits of advancing more efficient use of data, companies are more than likely to invest in data curation and other practices that promote fully using and reusing informational assets.

Additionally, there is a clear trend towards data democratization and self-service tools which improve the results of every department from sales and marketing to HR. Through collaboration and data sharing, data curation promotes the building of such data culture throughout the organization.

Finally, augmented analytics, automating most of the analysis procedures is on the rise. Automated tools can help to collect data, analyze and present it, as with other data curation tasks. But also, curation itself helps to advance such automation by monitoring and evaluating data and metadata management practices, thus paving the way for improvements.

Summing up

One of the greatest transgressions of contemporary society is that we tend to stop using things too soon by either throwing them away or just leaving them unused. Mindful members of society, as well as responsible companies, aim to reduce such waste by recycling and reusing what still has value. This same concept should be applied to data as well. Therefore, businesses are turning to data curation to make sure that data consumers get the chance to benefit from data reuse. And the business advantages that go with it raise the motivation to keep developing this and other data management practices.