Policies

Data Collections Development

Data Types

We accept engineering and social and behavioural sciences datasets derived from research conducted in the context of natural hazards. In the area of engineering the primary focus is on data generated through simulation, hybrid simulation, experimental and field research methods regarding the impacts of wind, earthquake, storm surge, wildfires, extreme heat hazards and sustainable materials management. As the field and the expertise of the community evolves we have expanded our focus to include datasets related to COVID-19.We also accept data reports, publications of Jupyter notebooks, research software, scripts, presentations, and learning materials. In social and behavioural sciences (SBE), accepted datasets and data instruments encompass the study of the human dimensions of hazards and disasters. Users that deposit datasets that do not correspond to the accepted data types will be alerted, when possible prior to publication, so they can remove their data. If a dataset non-compliant with the accepted data type gets published with a DOI, we will abide by the Tombstone policy. A curator will work with the user to find a repository adequate for their needs.

Dataset Size

Given the nature of research in natural hazards which involves large-scale experiments, simulations, and field research projects, we currently do not impose restrictions on the size of the datasets that can be published. Largest published datasets in DDR are ~5 TB. This approach recognizes the necessity of comprehensive data collection and the importance of making this data available for future research and analysis. However, we do recommend researchers to be selective and to publish data that is relevant to research reproducibility. Importantly, the dataset should be adequately organized and described so that other researchers interested in reusing the data can find what they need. Thus, publishing large sized datasets imply significant curation work. The Curation and Publication Best Practicesinclude recommendations to achieve quality dataset publications.

File Formats

Acknowledging that the natural hazards research community utilizes diverse research methods to generate and record data in both open and proprietary formats, and that there is continuous update of equipment used in the field, we do not have a hard file format restriction policy. However, we encourage our users to convert proprietary file formats to open formats when possible. The DDR follows the Library of Congress Recommended Format Statement and has best practices in place to convert proprietary formats to open formats for long term preservation; see the Recommended Data Formats. Consider that conversion from proprietary to open formats can present challenges. Matlab, for example, allows saving complex data structures, yet not all of the files stored can be converted to a csv or a text file without losing some clarity and simplicity for handling and reusing the data. In addition, some proprietary formats such as jpeg and excel have been considered standards for research and teaching for the last two decades. In attention to this, we allow users to publish the data in both proprietary and open formats. We keep file format identification information of all the datasets in the Fedora repository.

Data Curation

Data curation involves the organization, description, representation, permanent publication, and preservation of datasets in compliance with community best practices and Findable, Accessible, Interoperable, and Reproducible (FAIR)](https://www.go-fair.org/fair-principles/)data principles. Our goal is to enable researchers to curate their data from the beginning of a research project and turn it into publications through interactive pipelines and consultation with data curators. The DDR has and continues to invest efforts in developing and testing curation and publication pipelines based on data models designed with input from the NHERI community. In the DDR, data curation is a joint responsibility between the researchers that generate data and the DDR team. Researchers understand better the logic and functions of the datasets they create, and our team's role is to help them make these datasets FAIR-compliant.

Responsibilities: Researchers: Organize and describe their datasets using one of the DDR's data models. Consult the DDR Curation and Publication Guides, the Best Practices and Policies to comply with curation requirements. Write and publish documentation such as instruments, data reports, and data dictionaries, for long term usability of their published data. DDR Team: Provides curation guidance during virtual office hours, through help tickets and via email assisting researchers in achieving FAIR-compliant datasets. Maintains and enhances the DDR infrastructure for interactive curation of datasets. Reviews datasets pre and post-publication and suggests changes and improvements.

Data Models

The DDR team worked with NHERI experts to develop data models to curate the datasets generated in the natural hazards field. Based on the [Core Scientific Metadata Model], the models represent the structure and provenance of: experimental, simulation, field research/interdisciplinary, and hybrid simulation datasets. A data model type "other" was also developed for datasets that do not correspond to the research methods mentioned above and for other products such as posters, presentations, reports, check sheets, benchmarks, reports, etc. In the DDR interface users select one of this models as project type at the beginning of their interactive curation process. Implemented as interactive curation pipelines in the DDR curation interface the models allow users to organize their datasets in relation to research method and natural hazard type. This allows for a uniform curation experience and representation of published datasets.

To facilitate data curation of the diverse and large datasets generated in the fields associated with natural hazards, we worked with experts in natural hazards research to develop five data models that encompass the following types of datasets: experimental, simulation, field research, hybrid simulation, and other data products (See: 10.3390/publications7030051; 10.2218/ijdc.v13i1.661) as well as lists of specialized vocabulary. Based on the Core Scientific Metadata Model, these data models were designed considering the community's research practices and workflows, the need for documenting these processes (provenance), and using terms common to the field. The models highlight the structure and components of natural hazards research projects across time, tests, geographical locations, provenance, and instrumentation. Researchers in our community have presented on the design, implementation and use of these models broadly.

In the DDR web interface the data models are implemented as interactive functions with instructions that guide the researchers through the curation and publication tasks. As researchers move through the tion pipelines, the interactive features reinforce data and metadata completeness and thus the quality of the publication. The process will not move forward if requirements for metadata are not in place (See Metadata in Best Practices), or if key files are missing.

Metadata

DDR developed metadata to describe natural hazards datasets through a combination of data models, standard metadata schemas and expert contributed vocabularies. Embedded in the data models are categories and vocabularies as metadata elements that experts in the NHERI network contributed and deemed important for data explainability, reuse, and discoverability. Categories reflect the structure of the different research project types, and the expert vocabularies describe their components. The structure and components of each published datasets are graphically represented when expanding View Data on the dataset's landing page and through the View Data Diagram. For purposes of metadata exchange and interoperability, the elements and tags in the data models are mapped to widely-used standard metadata schemas. These are: Dublin Core for description of the dataset project, DDI (Data Documentation Initiative)for social and behavioral science datasets, DataCite for DOI assignment and citation, and PROV-O to record the structure of the datasets. Metadata mapping is substantiated during the data publication process when metadata is ingested to Fedora. Users can download the standardized metadata in json format from the datasets landing page. To further describe datasets, the curation interface offers the possibility to add both predefined and custom file tags. Predefined file tags are specialized terms provided by the natural hazard community. While use of tags is optional, it is highly recommended as tags improve the data browsing experience and are searchable. The lists of tags are evolving for each data model, continuing to be expanded, updated, and corrected as we gather feedback and observe how researchers use them in their dataset publications.

(Up to date, there is no standard metadata schema to describe natural hazards data. In DDR we follow a combination of standard metadata schemas and expert-contributed vocabularies to help users describe and find data.

Embedded in the DDR data models are categories and terms as metadata elements that experts in the NHERI network contributed and deemed important for data explainability and reuse. Categories reflect the structure and components of the research dataset, and the terms describe these components. The structure and components of the published datasets are represented on the dataset landing pages and through the Data Diagram presented for each dataset.

Due to variations in their research methods, researchers may not need all the categories and terms available to describe and represent their datasets. However, we have identified a core set of metadata that allows proper data representation, explainability, and citation. These sets of core metadata are shown for each data model in our Metadata Requirements in Best Practices.

To further describe datasets, the curation interface offers the possibility to add both predefined and custom file tags. Predefined file tags are specialized terms provided by the natural hazard community; their use is optional, but highly recommended. The lists of tags are evolving for each data model, continuing to be expanded, updated, and corrected as we gather feedback and observe how researchers use them in their publications.

For purposes of metadata exchange and interoperability, the elements and tags in the data models are mapped to widely-used, standardized metadata schemas. These are: Dublin Core for description of the dataset project, DDI (Data Documentation Initiative) for social science data, DataCite for DOI assignment and citation, and PROV-O to record the structure of the dataset. Metadata mapping is substantiated as the data is ingested into Fedora. Users can download the standardized metadata in the publications landing page. )

-------here-----

Metadata and Data Quality

The diversity and quantity of research methods, instruments, and data formats in natural hazards research is vast and highly specialized. For this reason, we conceive of data quality as a collaboration between the researchers and the DDR. In consultation with the larger NHERI network we are continuously observing and defining best practices that emerge from our community's understanding and standards.

We address data quality from a variety of perspectives:

Metadata quality: Metadata is fundamental to data explainability and reuse. To support metadata quality we provide onboarding descriptions of all metadata elements, indicate which ones are required, and suggest how to complete them. Requirements for core metadata elements are automatically reinforced within the publication pipeline and the dataset will not be published if those are not fulfilled. Metadata can be accessed by users in standardized formats on the projects’ landing pages.

Data content quality: Different groups in the NHERI network have developed benchmarks and guidelines for data quality assurance, including StEER, CONVERGE and RAPID. In turn, each NHERI Experimental Facility has methods and criteria in place for ensuring and assessing data quality during and after experiments are conducted. Most of the data curated and published along NHERI guidelines in the DDR are related to peer-reviewed research projects and papers, speaking to the relevance and standards of their design and outputs. Still, the community acknowledges that for very large datasets the opportunity for detailed quality assessment emerges after publication, as data are analyzed and turned into knowledge. Because work in many projects continues after publication, both for the data producers and reusers, the community has the opportunity to version datasets.

Data completeness and representation: We understand data completeness as the presence of all relevant files that enable reproducibility, understandability, and therefore reuse. This may include readme files, data dictionaries and data reports, as well as data files. The DDR complies with data completeness by recommending and requesting users to include required data to fullfill the data model required categories indispensable for a publication understandibility and reuse. During the publication process the system verifies that those categories have data assigned to them.The Data Diagram on the landing page reflects which relevant data categories are present in each publication. A similar process happens for metadata during the publication pipeline; metadata is automatically vetted against the research community’s Metadata Requirements before moving on to receive a DOI for persistent identification.

We also support citation best practices for datasets reused in our publications. When users reuse data from other sources in their data projects, they have the opportunity to include them in the metadata through the Related Works and Referenced Data fields.

Data publications review: Once a month, data curators meet to review new publications. These reviews show us how the community is using and understanding the models, and allows verifying the overall quality of the data publications. When we identify curation problems (e.g. insufficient or unclear descriptions, file or category misplacement, etc.) that could not be automatically detected, we contact the researchers and work on solving these issues. Based on the feedback, users have the possibility to amend/improve their descriptions and to version their datasets (See amends and version control).

Curation and Publication Assistance

We believe that researchers are best prepared to tell the story of their projects through their data publications; our role is to enable them to communicate their research to the public by providing flexible and easy to use curation resources and guidance that abide by publication best practices. To support researchers organizing, categorizing and describing their data, we provide interactive pipelines with onboarding instructions, different modes of training and documentation, and one-on-one help.

Interactive pipelines: The DDR interface is designed to facilitate large scale data curation and publication through interactive step-by-step capabilities aided by onboarding instructions. This includes the possibility to categorize and tag multiple files in relation to the data models, and to establish relations between categories via diagrams that are intuitive to data producers and easy to understand for data consumers. Onboarding instructions including vocabulary definitions, suggestions for best practices, availability of controlled terms, and automated quality control checks are in place.

One-on-one support: We hold virtual office hours twice a week during which a curator is available for real-time consulting with individuals and teams. Other virtual consulting times can be scheduled on demand. Users can also submit Help tickets, which are answered within 24 hours, as well as send emails to the curators. Users also communicate with curatorial staff via the DesignSafe Slack channel. The curatorial staff includes a natural hazards engineer, a data librarian, and a USEX specialist. Furthermore, developers are on call to assist when needed.

Guidance on Best Practices: Curatorial staff prepares step-by-step instructions and video tutorials, including special training materials for Undergraduate Research Experience students and for Graduate Students working at Experimental Facilities.

Webinars by Researchers: Various researchers in our community contribute to our curation and publication efforts by conducting webinars in which they relay their data curation and publication experiences. Some examples are webinars on curation and publication of hybrid simulations, field research and social sciences datasets.

Data Publication and Usage

Protected Data

Protected data are information subject to regulation under relevant privacy and data protection laws, such as HIPAA, FERPA and FISMA, as well as human subjects data containing Personally Identifiable Information (PII) and data involving vulnerable populations and or containing sensitive information.

Publishing protected data in the DDR involves complying with the requirements, norms, and procedures approved by the data producers Institutional Review Board (IRB) or equivalent body regarding human subjects data storage and publication, and managing direct and indirect identifiers in accordance with accepted means of data de-identification. In the DDR protected data issues are considered at the onset of the curation and publication process and before storing data. Researchers working with protected data in DDR have the possibility to communicate this to the curation team when they select a project type in DDR and the curator gets in touch with them to discuss options and procedures.

Unless approved by an IRB, most forms of protected data cannot be published in DesignSafe. No direct identifiers and only up to three indirect identifiers are allowed in published datasets. However, data containing PII can be published in the DDR with proper consent from the subject(s) and documentation of that consent in the project's IRB paperwork. In all publications involving human subjects, researchers should include and publish their IRB documentation showing the agreement.

If as a consequence of data de-identification the data looses meaning, it is possible to publish a description of the data, the corresponding IRB documents, the data instruents if applicable, and obtain a DOI and a citation for the dataset. In this case, the dataset will show as with Restricted Access. In addition, authors should include information of how to reach them in order to gain access or discuss more information about the dataset. The responsibility to maintain the protected dataset in compliance with the IRB comitements and for the long term will lie on the authors, and they can use TACC's Protected Data Services if they need to. For more information on how to manage this case see our Protected Data Best Practices.

It is the user’s responsibility to adhere to these policies and the procedures and standards of their IRB or other equivalent institution, and DesignSafe will not be held liable for any violations of these terms regarding improper publication of protected data. User uploads that we are notified of that violate this policy may be removed from the DDR with or without notice, and the user may be asked to suspend their use of the DDR and other DesignSafe resources. We may also contact the user’s IRB and/or other respective institution with any cases of violation, which could incur in an active audit (See 24) of the research project, so users should review their institution’s policies regarding publishing with protected data before using DesignSafe and DDR.

For any data not subject to IRB oversight but may still contain PII, such as Google Earth images containing images of people not studied in the scope of the research project, we recommend blocking out or blurring any information that could be considered PII before publishing the data in the DDR. We still invite any researchers that are interested in seeing the raw data to contact the PI of the research project to try and attain that. See our Protected Data Best Practices for information on how to manage protected data in DDR.

Tombstone

A tombstone is a landing page that describes a dataset that has been removed from public access. Removal of datasets can be caused because of research retraction, because the data is not compliant with the accepted Data Types, or upon curation review because it does not meet with one or more Curation Policy or Best Practices. In the latter case the curator reviewing the dataset will first alert the author/s to improve their publication within 30 days, upon which the dataset will be tombstoned. A tombstoned landing page contains the data citation and the DOI, but the dataset is not accessible.

Subsequent Publishing

Attending to the needs expressed by the community, we enable the possibility to publish data and other products subsequently within a project, each with a DOI. This arises from the longitudinal and/or tiered structure of some research projects such as experiments and field research missions which happen at different time periods, may involve multiple distinct teams, have the need to publish different types of materials or to release information promptly after a natural hazards event and later publish related products. Subsequent publishing is enabled in My Project interface where users and teams manage and curate their active data throughout their projects' lifecycle.

Timely Data Publication

Although no firm deadline requirements are specified for data publishing, as an NSF-funded platform we expect researchers to publish in a timely manner, so we provide recommended timelines for publishing different types of research data in our Timely Data Publication Best Practices.

Peer Review

Users that need to submit their data for revision prior to publishing and assigning a DOI have the opportunity to do so by: a) adding reviewers to their My Project, when there is no need for annonymous review, or b) by contacting the DesignSafe data curator through a Help ticket to obtain a Public Accessibility Data Delay (See below). Note that the data must be fully curated prior to requesting a Public Accessibility Delay.

Public Accessibility Delay

Many researchers request a DOI for their data before it is made publicly available to include in papers submitted to journals for review. In order to assign a DOI in the DDR, the data has to be curated and ready to be published. Once the DOI is in place, we provide services to researchers with such commitments to delay the public accessibility of their data publication in the DDR, i.e. to make the user’s data publication, via their assigned DOI, not web indexable through DataCote and or not publicly available in DDR's data browser until the corresponding paper is published in a journal, or for up to one year after the data is deposited. The logic behind this policy is that once a DOI has been assigned, it will inevitably be published, so this delay can be used to provide reviewers access to a data publication before it is broadly distributed. Note that data should be fully curated, and that it will be eventually indexed by search engines. Users that need to amend/correct their publications will be able to do so via version control. See our Data Delay Best Practices for more information on obtaining a public accessibility delay.

Data Licenses

DDR provides users with 5 licensing options to accommodate the variety of research outputs generated and how researchers in this community want to be attributed. The following licenses were selected after discussions within our community. In general, DDR users are keen about sharing their data openly but expect attribution. In addition to data, our community issues reports, survey instruments, presentations, learning materials, and code. The licenses are: Creative Commons Attribution (CC-BY 4.0), Creative Commons Public Domain Dedication (CC-0 1.0), Open Data Commons Attribution (ODC-BY 1.0), Open Data Commons Public Domain Dedication (ODC-PPDL 1.0), and GNU General Public License (GNU-GPL 3). During the publication process users have the option of selecting one license per publication with a DOI. More specifications of these license options and the works they can be applied to can be found in Licensing Best Practices.

DDR also requires that users reusing data from others in their projects do so in compliance with the terms of the data original license.

The expectations of DDR and the responsibilities of users in relation to the application and compliance with licenses are included in the DesignSafe Terms of Use, the Data Usage Agreement, and the Data Publication Agreement. As clearly stated in those documents, in the event that we note or are notified that the licencing policies and best practices are not followed, we will notify the user of the infringement and may cancel their DesignSafe account.

Data Citation

DDR abides by and promotes the Joint Declaration of Data Citation Principles amongst its users.

We encourage and facilitate researchers using data from the DDR to cite it using the DOI and citation language available in the datasets landing page. The DOI relies on the DataCite schema for citation and accurate access.

For users publishing data in DDR, we enable referencing works and or data reused in their projects. For this we provide two fields, Related Work and Referenced Data, for citing data and works in their data publication landing page.

The expectations of DDR and the responsibilities of users in relation to the application and compliance with data citation are included in the DesignSafe Terms of Use, the Data Usage Agreement, and the Data Publication Agreement. As clearly stated in those documents, in the event that we note or are notified that citation policies and best practices are not followed, we will notify the user of the infringement and may cancel their DesignSafe account.

However, given that it is not feasible to know with certainty if users comply with data citation, our approach is to educate our community by reinforcing citation in a positive way. For this we implement outreach strategies to stimulate data citation. Through diverse documentation, FAQs webinars, and via emails, we regularly train our users on data citation best practices. And, by tracking and publishing information about the impact and science contributions of the works they publish citing the data that they use, we demonstrate the value of data reuse and further stimulate publishing and citing data.

Data Publication Agreement

This agreement is read and has to be accepted by the user prior to publishing a dataset.

This submission represents my original work and meets the policies and requirements established by the DesignSafe Policies and Best Practices. I grant the Data Depot Repository (DDR) all required permissions and licenses to make the work I publish in the DDR available for archiving and continued access. These permissions include allowing DesignSafe to:

  1. Disseminate the content in a variety of distribution formats according to the DDR Policies and Best Practices.
  2. Promote and advertise the content publicly in DesignSafe.
  3. Store, translate, copy, or re-format files in any way to ensure its future preservation and accessibility,
  4. Improve usability and/or protect respondent confidentiality.
  5. Exchange and or incorporate metadata or documentation in the content into public access catalogues.
  6. Transfer data, metadata with respective DOI to other institution for long-term accessibility if needed for continuos access.

I understand the type of license I choose to distribute my data, and I guarantee that I am entitled to grant the rights contained in them. I agree that when this submission is made public with a unique digital object identifier (DOI), this will result in a publication that cannot be changed. If the dataset requires revision, a new version of the data publication will be published under the same DOI.

I warrant that I am lawfully entitled and have full authority to license the content submitted, as described in this agreement. None of the above supersedes any prior contractual obligations with third parties that require any information to be kept confidential.

If applicable, I warrant that I am following the IRB agreements in place for my research and following Protected Data Best Practices.

I understand that the DDR does not approve data publications before they are posted; therefore, I am solely responsible for the submission, publication, and all possible confidentiality/privacy issues that may arise from the publication.

Data Usage Agreement

Users who access, preview, download or reuse data and metadata from the DesignSafe Data Depot Repository (DDR) agree to the following policies. If these policies are not followed, we will notify the user of the infringement and may cancel their DesignSafe account.

  • Use of the data includes, but is not limited to, viewing parts or the whole of the content; comparing with data or content in other datasets; verifying research results and using any part of the content in other projects, publications, or other related work products.
  • Users will not use the data in any way prohibited by applicable laws, distribution licenses, and permissions explicit in the data publication landing pages.
  • The data are provided “as is,” and its use is at the users' risk. While the DDR promotes data and metadata quality, the data authors and publishers do not guarantee that:
    1. the materials are accurate, complete, reliable or correct;
    2. any defects or errors will be corrected;
    3. the materials and accompanying files are free of viruses or other harmful components; or
    4. the results of using the data will meet the user’s requirements.
  • Use of data in the DDR abides by the DesignSafe Privacy Policy.
  • Users are responsible for abiding by the restrictions outlined by the data author in their publications' landing pages and by the DDR in this agreement, but they are not responsible for any restrictions not otherwise explicitly described here or in the landing pages.
  • Users will not obtain personal information associated with DDR data that results in directly or indirectly identifying research subjects, individuals, or organizations with the aid of other information acquired elsewhere.
  • Users will not in any event hold the DDR or the data authors liable for any and all losses, costs, expenses, or damages arising from use of DDR data or any other violation of this agreement, including infringement of licenses, intellectual property rights, and other rights of people or entities contained in the data.
  • We do not gather IP addresses about public users that preview or download files from the DDR.
  • Our system logs file actions completed by registered users in the DDR including previewing, downloading or copying published data to My Data or My Projects. We only use this information in aggregate for metrics purposes and do not link it to the user’s identity.

Amends and Version Control

Users can amend and version their data publications. Since the DDR came online, we have helped users correct and or improve the metadata applied to their datasets after publication. Most requests involve improving the text of the descriptions, changing the order of the authors, and adding references of papers publised using the data in the project; users also required the possibility to version their datasets. Our amends and version control policy derives from meeting our users needs.

Changes allowed during amends are:

  • Adding Related Works such as a paper they published after the data.
  • Correct typos and or improve the abstract and the keyword list.
  • Correct or add an award.
  • Change the order of the authors.

If users need to add or delete files or change the content of the files, they have the opportunity to version their data publication. The following are the

  • Versions will have the same DOI, and the title will indicate the version number. The decision to maintain the same DOI was agreed upon by our community to facilitate DOI management to data publishers and users.
  • Users will be able to view all existing versions in the publication's landing page.
  • The DOI will always resolve in the latest version of the publication.
  • Versions are documented by data publishers so other users understand what changed and why. The documentation is publicly displayed

Documentation of versions requires including the name of the file/s changed, removed or added, and identifying within which category they are located. We include guidance on how to document versions within the curation and publication onboarding instruction.

The Fedora repository manages all amends and versions so there is a record of all changes. Version number is passed to DataCite as metadata.

More information about the reasons for amends and versioning are in Publication Best Practices.

Leave Data Feedback

Users can click a “Leave Feedback” button on the projects’ landing pages to provide comments on any publication. This feedback is forwarded to the curation team for any needed actions, including contacting the authors. In addition, it is possible for users to message the authors directly as their contact information is available via the authors field in the publication landing pages. We encourage users to provide constructive feedback and suggest themes they may want to discuss about the publication in our Leave Data Feedback Best Practices

Tombstone

A tombstone is a landing page describing a dataset that has been removed from public access in a data repository. Creation and maintenance of the tombstone is a responsibility of the repository. In DDR curators review published datasets regularly. If data creators deposit data that does not meet the accepted data types, or if their dataset does not comply with DDR's Policies and Best Practices, they will be alerted after publication to improve their dataset presentation via amends or versioning. Users have 30 days from notification before tombstoning so they have enough time to implement improvements, and the curator will work with them . In DDR the data will still be available to the creator in My Project.

Data Impact

We understand data impact as a strategy that includes complementary efforts at the crossroads of data discoverability, usage metrics, and scholarly communications.

Search Engine Optimization (SEO)

We have in place SEO methods to enhance the web visibility of the data publications. To increase discoverability and indexing of our publications we follow guidance from Google Search Console and Google Data Search.

Data Usage Metrics

Our metrics follow the Make your Data Count Counter Code of Practice for Research Data.

Below are the definitions for each metric:

File Preview: Examining data in the portal such as clicking on a file name brings up a modal window that allows previewing files. Not all document types can be previewed. Among those that can are: text, spreadsheets, graphics and code files. (example extensions: .txt, .doc, .docx, .csv, .xlsx, .pdf, .jpg, .m, .ipynb). Those that can't include binary executables, MATLAB containers, compressed files, and video (eg. .bin, .mat, .zip, .tar, mp4, .mov).

File Download: Copying a file to the machine the user is running on, or to a storage device that machine has access to. This can be done by ticking the checkbox next to a document and selecting "Download" at the top of the project page. With documents that can be previewed, clicking "Download" at the top of the preview modal window has the same effect. Downloads are counted per project and per individual files. We also consider counts of copying a file from the published project to the user's My data, My projects, or to Tools and applications in DesignSafe or one of the connected spaces (Box, Dropbox, Google Drive). Tick the checkbox next to a document and select "Copy" at the top of the project page.

File Requests: Total file downloads + total file previews.

Project Downloads: Total downloads of a compressed entire project to a user's machine.

We report the metrics in the publications landing pages. To provide context to the metrics, we indicate the total amount of files in each publication.

We started counting since May 17, 2021. We update the reports on a monthly basis and we report data metrics to NSF every quarter. Currently we are in the process of formatting the reports to participate in the Make your Data Count initiative.

Data Reuse Stories Case Studies

Since 2020 we conduct Data Reuse Case Studies . For this, we identify published papers and interview researchers that have reused data published in DDR. In this context, reuse means that researchers are using data published by others for purposes different than those intended by the data creators. During the interviews we use a semi-structured questionnaire to discuss the academic relevance of the research, the ease of access to the data in DDR, and the understandability of the data publication in relation to metadata and documentation clarity and completeness. We feature the data stories on the DesignSafe website and use the feedback to make changes and to design new reuse strategies.

Dataset Awards

In 2021 we launched the DesignSafe Dataset Awards to encourage excellence in data publication and to stimulate reuse. Data publications are nominated by our designated user community based on contribution to scientific advancement and curation

Data Privacy

This policy explains what information DesignSafe collects through your use of DDR and how we treat that information. The DDR website may contain links to other websites, applications and/or software. We are not responsible for the privacy practices of these third parties, and you should read through their practices before clicking or using them.

The DesignSafe DDR is hosted at the University of Texas at Austin, Texas Advanced Computing Center (TACC). A DesignSafe user account is a TACC user account. When registering for an account, TACC collects your name, email address, institution, and country of citizenship. Additionally, after account approval and subsequent login to DesignSafe, DesignSafe collects your Natural Hazard Interests, Technical Domain, Professional Level and Research Activities.

When a user accesses DesignSafe, our web server software generates log files of the IP address of their computer and the user-agent, which contains minimal information about their browser and operating system. If the user is logged to DesignSafe, their username is also recorded. When a user downloads a file from the DDR, our software collects the aforementioned data and accompanying download data such as the time of the download and files downloaded. The aforementioned data is available to DesignSafe web programmers and data analysts to help diagnose problems, manage the repository, respond to users requests, and to provide services.

When users publish data or other products as authors and co-authors their names, email address, and institution become publicly available in the dataset's landing page. This facilitates establishing contact with authors about the particulars of their datasets publications.

Data collected during downloads, previews, or copies of files as well as views of published datasets are used solely in aggregate to comply with Make Data Count usage reporting standards. This information is processed and made publicly available in the datasets landing pages as Views and Downloads for purposes of showing the impact of the datasets. All data is retained in the logs and DDR's internal database as needed for business purposes. We do not share any personally identifiable information we collect or develop about our users to any third parties for any purpose, unless required by law. Any reports we may share externally use unidentifiable, aggregated data.

DesignSafe only uses first-party cookies for authentication. We use cookies so that users don't have to re-authenticate every time they refresh the page and no PII is stored in those cookies. There are Google Analytics cookies that collect metrics about visitors, but the personally identifying pieces like IP addresses are anonymized. Users browsing habits are not tracked for advertising or marketing purposes.

Users are required to use Multi-Factor Authentication (MFA) as an additional security measure when logging to DDR. DDR has security measures to prevent loss of the data and information. See the DesignSafe Cyber Security Policy for more details.

Data Preservation

Data preservation encompasses diverse activities carried out by all the stakeholders involved in the lifecycle of data, from data management planning to data curation, publication and long-term archiving. Once data is submitted to the Data Depot Repository (DDR,) we have functionalities and Guidance in place to address the long-term preservation of the submitted data.

The DDR has been operational since 2016 and is currently supported by the NSF from October 1st, 2020 through September 30, 2025. During this award period, the DDR will continue to preserve the natural hazards research data published since its inception, as well as supporting preservation of and access to legacy data and the accompanying metadata from the Network for Earthquake Engineering Simulation (NEES), a NHERI predecessor, dating from 2005. The legacy data comprising 33 TB, 5.1 million files,2 and their metadata was transferred to DesignSafe in 2016 as part of the conditions of the original grant. See NEES data here.

Data in the (DDR) is preserved according to state-of-the art digital library standards and best practices. DesignSafe is implemented within the reliable, secure, and scalable storage infrastructure at the Texas Advanced Computing Center (TACC), with 20 years of experience and innovation in High Performance Computing. TACC is currently over 20 years old, and TACC and its predecessors have operated a digital data archive continuously since 1986 – currently implemented in the Corral Data Management system and the Ranch tape archive system, with capacity of approximately half an exabyte. Corral and Ranch hold the data for DesignSafe and hundreds of other research data collections. For details about the digital preservation architecture and procedures for DDR go to Data Preservation Best Practices.

Within TACC’s storage infrastructure a Fedora repository, considered a standard for digital libraries, manages the preservation of the published data. Through its functionalities, Fedora assures the authenticity and integrity of the digital objects, manages versioning, identifies file formats, records preservation events as metadata, maintains RDF metadata in accordance to standard schemas, conducts audits, and maintains the relationships between data and metadata for each published research project and its corresponding datasets. Each published dataset in DesignSafe has a Digital Object Identifier, whose maintenance we understand as a firm commitment to data persistence.

While at the moment DDR is committed to preserve data in the format in which it is submitted, we procure the necessary authorizations from users to conduct further preservation actions as well as to transfer the data to other organizations if applicable. These permissions are granted through our Data Publication Agreement, which authors acknowledge and have the choice to agree to at the end of the publication workflow and prior to receiving a DOI for their dataset.

Data sustainability is a continuous effort that DDR accomplishes along with the rest of the NHERI partners. In the natural hazards space, data is central to new advances, which is evidenced by the data reuse record of our community and the following initiatives:

Continuity of Access

As part of the requirements of the current award we have a contingency pan in place to transfer all the DDR data, metadata and corresponding DOIs to a new awardee (should one be selected) without interruption of services and access to data. Fedora has export capabilities for transfer of data and metadata to another repository in a complete and validated fashion. The portability plan is confirmed and updated in the Operations Project Execution Plan that we present anually to the NSF.

In the case in which the NSF and/or the other stakeholders involved in this community decide not to continue the NHERI program or a subsequent data repository, we will continue to preserve the published data and provide access to it through TACC, DesignSafe’s host at the University of Texas at Austin. TACC has formally committed to preserving the data with landing pages and corresponding DOIs indefinitely. TACC has on permanent staff a User Services team as well as curators that will attend users’ requests /and help tickets related to the data. Because TACC is constantly updating its high-performance storage resources and security mechanisms, data will be preserved at the same preservation level that is currently available. Considering that DOIs are supported through the University of Texas Libraries and that the web services and the data reside within TACC’s managed resources, access to data will not be interrupted. Fedora is now part of TACC’s software suite and we will continue its maintenance as our preservation repository. Like with all systems at TACC, we will revisit its versioning and continuity and make decisions based on state-of-the-art practices. Should funding constraints ever make this no longer possible, TACC will continue to keep an archive copy on Ranch (with landing pages on online storage) for as long as TACC remains a viable entity.

Changes to DDR Policies

DDR regularly revisits its policies and will post changes and or alert the community. Please, check this page regularly for current information. For any questions about the policies send a help ticket or join curation office hours.