dissertation

I am a doctoral candidate in the Department of Biomedical Informatics at the University of Pittsburgh.

phd dissertation overview

Many initiatives encourage research data sharing in hopes of increasing research efficiency and quality, but the effectiveness of these early initiatives is not well understood.  Reusing research data has many benefits for the scientific community  new research hypotheses can be tested more quickly and inexpensively when duplicate data collection is reduced.  Shared data can be aggregated to study otherwise-intractable issues, and a more diverse set of scientists can become involved when analysis is opened beyond those who collected the original data.  Publicly available data helps to identify errors, discourages fraud, and is useful for training new researchers.

Funders, publishers and academic organizations — eager to realize such benefits — have developed tools, resources and policies to encourage and require data-producing investigators to make their datasets publicly available.  Despite these investments of time and money, we do not have a firm grasp on the prevalence or patterns of data sharing and reuse, the effectiveness of initiatives, or the costs, benefits, and impact of repurposing biomedical research data.

Previous assessments methods for assessing data sharing prevalence have included manual curation and investigator self-reporting. Models of knowledge sharing have emerged from the information science and management of information systems communities, usually derived from case studies or survey instruments.  These approaches provides insight into motivation, but are subject to an intention-action gap and are labor-intensive to repeat in multiple subdisciplines and over time to monitor changes in behavior.

The proposed research will build on and supplement previous work through an analysis of observed variables, thereby providing an alternative perspective for understanding and monitoring data sharing behavior.

My research questions:

  1. Does data sharing have benefit for those who share?
  2. Can data sharing and withholding be systematically and automatically measured?
  3. How often is data shared?  What predicts sharing?  How can we model sharing behavior?

proposal

Various versions of my proposal:

status

Aim 1: Does data sharing have benefit for those who share?
Completed and published.

Aim 2: Can data sharing and withholding be systematically and automatically measured?
Completed.  I am currently writing up the results of Aim 2a.  A paper on Aim 2b is under review at the journal Discovery and Collaboration.

Aim 3: How often is data shared?  What predicts sharing?  How can we model sharing behavior?

A pilot study for Aim 3 was presented at the recent Symposium on Informetrics and Scientometrics (my slides), and has been accepted for a special issue of the Journal of Informetrics.
Data collection for the full implementation of Aim 3 is underway.

anticipated completion

I’m hoping to graduate in Spring 2010.