8.5 Tools to help you

Both the EML and the meta file are XML files. XML stands for extensible Markup Language and is a human- and machine-readable format to store and transport data. The general structure of XML files is a hierarchical tree consisting of a root element and an unlimited number of sub-elements of different levels, as well as attributes and text. For an extensive description on how to build XML files, see the tutorial of the W³ schools.

To create the two XML files needed for the Darwin Core Archive, you do not have to be an expert in XML. In R, there are several packages that facilitate the creation of XML and even more specific, EML files, for example emld (Boettiger et al. (2020)), EML (Boettiger & Jones (2022)) and xml2 (Wickham, Hester, et al. (2023)).

Additionally, you can find a nice overview over packages and helpful websites to build EML files in R on the website of the LivingNorwayR package.

8.5.1 EML.xml file

The key for creating the EML.xml file are lists. The content of every EML term has to be stored as a character in a list. If terms consist of subterms, they have to be stored in nested lists within the list of the parent term (see example box below). For a full example, you can also browse our GitHub repository.

Bud burst:

# Provide keywords and their thesaurus fo the EML term "keywordSet"
keywordSet <- list(list(keyword = list("bud burst", "trees", "ecology", "plant phenology"),
                      keywordThesaurus = "envThes"),
                 list(keyword = "oak",
                      keywordThesaurus = "GEMET"))

keyword and keywordThesaurus are the EML subterms of keywordSet, which looks like this in XML:

<keywordSet>
  <keyword>bud burst</keyword>
  <keyword>trees</keyword>
  <keyword>ecology</keyword>
  <keyword>plant phenology</keyword>
  <keywordThesaurus>envThes</keywordThesaurus>
</keywordSet>
<keywordSet>
  <keyword>oak</keyword>
  <keywordThesaurus>GEMET</keywordThesaurus>
</keywordSet>

The lists of all terms have to be combined in a final list that can then be converted into an XML file through the function write_eml() of the EML package. Some EML terms require XML attributes (see EML terms). These can be assigned using the xml2 package by first identifying all nodes for which the attribute should be set (xml2::xml_find_all()) and then setting the attribute with xml2::xml_set_attr()).

Bud burst:

The following example shows how the final list is converted into the EML file and how to set the attribute “provider” for the taxonID.

# Combine all components in one list
eml <- list(dataset =
              list(title = title,
                   creator = creator,
                   pubDate = publication_date,
                   language = language,
                   abstract = abstract,
                   keywordSet = keywords,
                   licensed = licensed,
                   coverage = coverage,
                   contact = contact_person,
                   methods = methods,
                   maintenance = maintenance,
                   intellectualRights = intellectualRights),
            packageId = packageId, 
            system = "uuid")

# Write EML file
EML::write_eml(EML, file = "cricket_EML.xml")          


# Add attributes for specific nodes -------------------------------

# Read EML file as XML file
EML <- xml2::read_xml("cricket_EML.xml")

# Identify all taxonId nodes for which attribute shall be set
taxonId_node <- xml2::xml_find_all(EML, xpath = "//taxonId")

# Set "provider" attribute for taxonId nodes
xml2::xml_set_attr(taxonId_node, attr = "provider", value = "https://www.gbif.org/")

Once you set all required attributes, you should check whether your EML file is schema-valid. This can be done with the function eml_validate() of the emld package.

8.5.2 meta.xml file

In contrast to the EML file, where the metadata is specific to the dataset and has to be filled in by hand, the meta file always consists of the same content only depending on the file and column names of the individual Darwin Core Archive. The root element <archive> consists of the subelements <core> and <extension> (once for every extension file). Within these elements, there is a <field> element for every column, linking to the URI of the corresponding Darwin Core term. For a more detailed description of the components of the meta.xml file see the Darwin Core text guide.

As the meta.xml is always structured in the same way and the contents can directly be derived from the column headers in the Darwin Core Archive files and do not need to be filled individually, the creation of the meta.xml file can be automated. Here you can find a function that you can use to do this in R.

Crickets:

A shortened example of the meta.xml file of the cricket data.

<?xml version="1.0" encoding="UTF-8"?>
<archive xmlns="http://rs.tdwg.org/dwc/text/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://rs.tdwg.org/dwc/text/ http://rs.tdwg.org/dwc/text/tdwg_dwc_text.xsd">
  <core encoding="UTF-8" fieldsTerminatedBy="," linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Event">
    <files>
      <location>crickets_event.csv</location>
    </files>
    <id index="0"/>
    <field index="0" term="http://rs.tdwg.org/dwc/terms/eventID"/>
    <field index="1" term="http://rs.tdwg.org/dwc/terms/samplingProtocol"/>
    <field index="2" term="http://rs.tdwg.org/dwc/terms/sampleSizeValue"/>
    <field index="3" term="http://rs.tdwg.org/dwc/terms/sampleSizeUnit"/>
    <field index="4" term="http://rs.tdwg.org/dwc/terms/eventDate"/>
    <field index="5" term="http://rs.tdwg.org/dwc/terms/year"/>
    [...]
  </core>
  <extension encoding="UTF-8" fieldsTerminatedBy="," linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.iobis.org/obis/terms/ExtendedMeasurementOrFact">
    <files>
      <location>crickets_extendedmeasurementorfact.csv</location>
    </files>
    <coreid index="1"/>
    <field index="0" term="http://rs.tdwg.org/dwc/terms/measurementID"/>
    <field index="2" term="http://rs.tdwg.org/dwc/terms/measurementType"/>
    <field index="3" term="http://rs.tdwg.org/dwc/terms/measurementValue"/>
    <field index="4" term="http://rs.tdwg.org/dwc/terms/measurementUnit"/>
    [...]

References

Boettiger, C., & Jones, M. B. (2022). EML: Read and write ecological metadata language files. https://docs.ropensci.org/EML/
Boettiger, C., Jones, M. B., & Mecum, B. (2020). Emld: Ecological metadata as linked data. https://docs.ropensci.org/emld/
Wickham, H., Hester, J., & Ooms, J. (2023). xml2: Parse XML. https://xml2.r-lib.org/