1.5 FAIR and structure evaluation of your data

Datasets vary widely in format and complexity, as well as degree of structure and FAIRness. It is therefore important to first evaluate which state your data is in and what steps you can take to further enrich of your data.

Even though there is already a range of FAIR assessment tools available (see here for an overview), that can either automatically assess the FAIRness of your data or provide self-assessment questionnaires, we decided to develop our own tool for several reasons:

  • Existing tools tend to purely assign a score to the data but provide little guidance on how to actually improve the FAIRness of the data (Krans et al. (2022)). In contrast, our tool directly provides suggestions on how to enrich the data based on the given answers and by directing the user to the corresponding chapters of this guide.

  • We do not score the data for every letter of FAIR as done in other tools, but score them in the four properties metadata, storage, standards and structure (more details below). While the evaluation questions of the first three components are directly derived from the FAIR principles, the structure component is an extension of FAIR not included in other tools.

  • Automated tools often require the data to be available online already, which is not the state at which we want to start with our evaluation.

Our evaluation tool “FAIR + Structure Evaluation Tool” provides you with a set of simple questions about four properties of your data: metadata, storage, standards and structure.

Disclaimer

The FAIR + Structure Evaluation Tool is currently still under development and therefore not yet accessible outside of the Netherlands Institute of Ecology (NIOO-KNAW). For now, you can answer the evaluation questions in a non-interactive way by going through the list of questions in section Questions for FAIR + Structure Evaluation Tool.

Metadata is data about your data and contains information about the who, where, what, when and how of data collection allowing another user of your data to understand and reuse it without prior knowledge of the data.

Storage is about whether your data is stored persistently and in a way that makes it findable and accessible to others.

Standards describe uniform, community-accepted formats in which both data and metadata are stored and which enhance compatibility with other datasets.

Structure is about whether your data is organised in a consistent and logical way that makes the data easier to understand for others.

Apart from the questions about structure, the questions in the evaluation are all directly based on the FAIR principles. Figure 1.2 visualises how the data properties in the evaluation link to the letters of FAIR. Your answers will provide you with a circle diagram showing the status of your dataset in each of the properties and an associated list of chapters of this manual you can work through to further enrich your data.

Link between data properties and FAIR. The letters on the left stand for the data properties used in the FAIR and structure evaluation (metadata, storage, standard, structure). Each plus indicates that the data property improves the maturity of each of the respective letter of FAIR. A blank space means that this data property has no direct influence on the respective letter of FAIR.

Figure 1.2: Link between data properties and FAIR. The letters on the left stand for the data properties used in the FAIR and structure evaluation (metadata, storage, standard, structure). Each plus indicates that the data property improves the maturity of each of the respective letter of FAIR. A blank space means that this data property has no direct influence on the respective letter of FAIR.

It will likely be difficult to reach the full score in all of the properties of your data, but this is also not necessarily the aim of this evaluation or guide. Every improvement is already a great step! Especially the components that link to interoperability are more difficult to implement, as true interoperability generally is a big challenge within and across disciplines (Pagano et al. (2013)). One step to reach interoperability is to use a language for knowledge representation and linked data, such as RDF (Resource Description Framework). However, as this is more the expertise of data scientists or information scientists instead of ecologists, we will not look into these topics in this guide and, hence, it is not possible to reach the full score in the evaluation of “Standards” by only following this guide. For more information on full interoperability see this section.

References

Krans, N. A., Ammar, A., Nymark, P., Willighagen, E. L., Bakker, M. I., & Quik, J. T. K. (2022). FAIR assessment tools: Evaluating use and performance. NanoImpact, 27, 100402. https://doi.org/10.1016/j.impact.2022.100402
Pagano, P., Candela, L., & Castelli, D. (2013). Data interoperability. Data Science Journal, 12(0), GRDI19–GRDI25. https://doi.org/10.2481/dsj.grdi-004