Recommended Informed Consent Language for Data Sharing
Language to Avoid
Promises in the informed consent can appear to limit an investigator's ability to share data with the research community. In reality, investigators can inform study participants that they are scientists with an obligation to protect confidentiality and still share the study data with the broad scientific community. Many effective means exist to create public-use data files or share restricted-use data files under controlled conditions. That is, data can be modified to reduce the risk of disclosure or shared with additional safeguards while preserving their value for science.
Model Language
Here are two model statements investigators may use in informed consents to describe protection of confidentiality that also allows data sharing.
Sample 1. Study staff will protect your personal information closely so no one will be able to connect your responses and any other information that identifies you. Federal or state laws may require us to show information to university or government officials (or sponsors), who are responsible for monitoring the safety of this study. Directly identifying information (e.g. names, addresses) will be safeguarded and maintained under controlled conditions. You will not be identified in any publication from this study.
Sample 2. The information in this study will be used only for research purposes and in ways that will not reveal who you are. Federal or state laws may require us to show information to university or government officials (or sponsors) who are responsible for monitoring the safety of this study. You will not be identified in any publication from this study.
Known Concerns and Recommended Alternatives
Concern 1
Terms such as "anonymous" and "de-identified" are undefined and left open to interpretation. Some data are collected anonymously as directly identifying information is never obtained. De-identification may involve more than removing direct identifiers. Indirect identifiers in the file may still be used in combination to isolate a subject that is unique on certain characteristics. Even using the "safe harbor" method of de-identification by removing 18 specified elements still requires the covered entity to affirm it has no "actual knowledge that the remaining information alone or in combination with other information can be used to possibly identify the subject" (source).
Recommendations:
Use descriptive sentences that state what information will not be shared:
- "Any personal information that could identify you will be removed or changed before files are shared with other researchers or results are made public."
- "Your answers to the questions I ask will be anonymous. That is, I will not ask for your name, and we will not attach your name or jail number to your answers."
- "The Personally Unidentified Study Data does not include your name, address, telephone or social security number."
Use descriptive sentences that state what may be retained in data if shared with other researchers:
- "Personally Unidentified Study Data may include your date of birth, initials, and dates you received medical care. Personally Unidentified Study Data also may include the health information used, created, or collected in the research study."
Concern 2
Confusion on whether language refers to identifiable subject information or research data that are separate from subject contact information or other direct identifiers.
Recommendations:
Establish a term or phrase that identifies the identifiable information (i.e., contact information or other direct identifiers) that will not be shared.
Establish another term or phrase for the research data (i.e., the "coded" information or "your answers" that does not contain the contact information or other direct identifiers but still may include indirect identifiers).
Use these terms consistently throughout the form. Avoid indefinite language, such as "your data," "your study information," "all information collected about you," or "study results."
Concern 3
Promises made that the data will be seen or accessed only by the research team.
Recommendations:
Explain the form that the identifying information will take and who has access:
- "Your identifying information will be replaced with codes. Only the research team will have access to information that identifies you to carry out this research study. Your identifying information will not be shared with others outside this research study."
If no "research data" will be released to persons in official or unofficial capacities, make sure that sharing data with other researchers is not left to be interpreted in this category but is allowed through a statement such as:
- "Any personal information that could identify you will be removed or changed before files are shared with other researchers or results are made public."
- "During the project, information from this study will be kept in locked files that only the research staff can open. Any personal information that could identify you will be removed or changed before files are shared with other researchers or results are made public."
- "This [Certificate of Confidentiality] allows for answers that you give during the surveys to be kept secret during this project. Any personal information that could identify you will be removed or changed before files are shared with other researchers or results are made public."
State it explicitly when personally identifying information will be destroyed, removed, or changed:
- "All personally identifying information collected about you will be destroyed once it is no longer needed for the study. Any personal information that could identify you will be removed or changed before files are shared with other researchers or results are made public."
Concern 4
Descriptions of how the information will be stored during the project imply the data are stored only during the project and do not allow the storage of any data beyond the research project so the research portion of the data can be shared.
Directly identifying elements need to be stored separately from the "research data" (i.e., the data for analysis) and must be destroyed within a specified period after the end of the research project. The research data can be shared if appropriately de-identified or as a limited dataset (aka restricted-use dataset).
Recommendation:
Explain the duration of the data storage and what happens to directly identifying information:
- "If you decide to be in this study, the study researchers will get information that identifies you and your personal health information. This may include information that might directly identify you, such as your name and address. This information will be kept for the length of the study (five years). After that time it will be destroyed or de-identified, meaning we will replace your identifying information with a code that does not directly identify you. The principal investigator will keep a link that identifies you to your coded information, but this link will be kept secure and available only to the principal investigator or selected members of the research team. Any information that can identify you will remain confidential. Any personal information that could identify you will be removed or changed before files are shared with other researchers or results are made public.
Concern 5
The phrase "shared anonymously" may prohibit sharing data using a limited-use (aka restricted-use) dataset if the data cannot be completely anonymized or de-identified.
Recommendation:
If the personally identifying elements were collected (i.e., the data were not collected anonymously), use a statement that acknowledges the directly identifiable information will be removed but does not promise all indirect identifiers will be removed:
- "Any personal information that could identify you will be removed or changed before files are shared with other researchers or results are made public."
Concern 6
The informed consent and, therefore, the authorization to use information (i.e., research data) expire, e.g., "Only valid while the study is being done."
Recommendations:
Do not mention an expiration time period.
State the authorization never ends unless the subject revokes it.
State the retention of personally identifiable information expires but the data without the personally identifiable information may be used for future research.
Concern 7
Subjects may undermine their informed consent by discussing participation or granting consent to others.
Recommendations:
Researchers should caution their subjects about discussing their participation in the project with others and recommend that they keep their copy of the consent form in a secure place.
The informed consent may include an explicit statement such as this:
- "If you voluntarily give your written consent for anyone to receive information about your participation in the research, then we may not use the [Certificate of Confidentiality] to withhold this information."
The following information also is good to acknowledge in the informed consent as appropriate:
- "Confidential information may accidentally be revealed to others not associated with the project," along with a description of the researchers' role in keeping that from happening.
- "Identifying information may be shared for government auditing of the research project."
- "Once your Personal Health Information is released, it may be re-disclosed and not protected by HIPAA."
- Subjects understand they are authorizing "use and release" of their research data (i.e., their answers that do not include personal identifiers) as described in the informed consent.
Conventional Language Used in the Past
Conventional language that most studies in the past used should not be a barrier to sharing a public-use or restricted-use file. For example:
"Your answers will be held in strict confidentiality and will be used only for the purposes of this study. The results will be reported in aggregate form only, and cannot be identified individually."
The first statement of this generic consent language insures nothing other than that the researcher will not release the subject's "identity"; this is standard practice and means the researcher will protect the subject from direct identification. ICPSR disclosure control steps do the same when data are evaluated for public release.
The second statement says the answers (i.e., the data) will be used only for the purpose of the study, not who will fulfill the purpose of the study. Very often, the purpose of a study is broadly defined in substantive terms that are appropriate for almost all forms of research conducted through secondary analysis of the data.
In the second sentence, that the results will be released in aggregate form suggests some ambiguity. However, the ICPSR Terms of Use binds secondary analysts to exactly this type of provision. It means that secondary analysts are to use either statistical or tabular representations of the data in published or analytic form. It does not bar the release of de-identified subject-level data used for substantive research analysis where the user has acknowledged the ICPSR Terms of Use.
Unless the informed consent names the members of the research team specifically, an amended Institutional Review Board application that includes a plan for data protection and dissemination can be filed with the lead institution to define the research team as those persons known to the original researchers. The language in the informed consent may prevent the release of data as public-use, but does not preclude the possibility of a research team that is defined by a group of restricted- or limited-use agreement holders. With such agreements, the researchers using the data are known to ICPSR and to the original research team.