Abstract
Pharmacy schools are generating significant amounts of data across the training continuum, including data about student selection, performance, and job placement. However, current data practices limit the Academy’s ability to effectively leverage the vast amounts of data available within and across pharmacy institutions. To improve data practices and promote the quality and reusability of data, a set of guiding principles for data management and stewardship were developed and published in 2016. The FAIR principles state that digital objects should be findable (ie, data have a unique identifier and are registered in a searchable resource), accessible (ie, data are retrievable by their identifier using an open, free, standardized protocol), interoperable (ie, data use a formal, accessible, shared, and broadly applicable language, and include qualified references to other data), and reusable (ie, data are described with accurate and relevant attributes, released with a data usage license, and meet domain-relevant community standards). This commentary advocates for improved data practices and provides recommendations for advancing FAIR data principles in pharmacy education.
INTRODUCTION
Pharmacy schools generate various types of data, including data about student selection (eg, recruitment, interviews), performance (eg, observations, examinations), and job placement (eg, career placement, residency matching). While some data are generated primarily for the purpose of operational decision-making, such as selecting students or evaluating student skills, other data are generated primarily for education research, such as surveying underrepresented students about their career interests.1 In some cases, data collected for operational purposes might also be used for research, such as in examining residency attainment by pharmacy graduates from US schools.2
Pharmacy educators and educational researchers increasingly rely on computational tools to act on a wide range of data types, formats, and protocols that support empirical research and decision-making. If you consider a student’s entire educational record, for example, there might be numeric data related to admissions, course grades, class attendance, objective structured clinical examination (OSCE) ratings, experiential grades, Pharmacy Curriculum Outcomes Assessment (PCOA) scores, a North American Pharmacist Licensure Examination (NAPLEX) score, and demographic characteristics. There might also be non-numeric (or unstructured text/image) data such as admissions statements or recommendations, instructor and preceptor comments, photographs, and portfolios. Further, these data may reside in numerous systems or software that use varied database structures and processes for access, analysis, and reporting, such as learning management systems, human resources systems, testing or course evaluation software, and admissions or experiential databases. While these types of data can provide insight into critical questions about learning and development, current data practices limit our ability to effectively access, federate, process, and analyze the vast amounts of data available within and across educational institutions. If one wanted to examine multiple years of admissions data and their relationship to student performance, for example, one would be faced with issues related to aggregating data from multiple databases, understanding how admissions interview scores and student grades are measured, addressing changes to admissions and grade methodologies over time, operationalizing demographic variables, and handling missing data. Attempting to aggregate data across multiple institutions might further amplify these types of challenges since institutions may use different approaches to define, collect, process, and analyze their data. These examples illustrate possible inconsistencies in data practices, such as discrepancies in protocols for data acquisition, curation, and validation; lack of clear data standards and definitions; increasing volume and complexity of data and analytic tools, and limited data sharing. Inconsistent data practices can further manifest themselves in publications as insufficient descriptions of methods and results, selective data presentation, inconsistent ontologies, poor data management, and incorrect analytic techniques, all potentially leading to concerns regarding data reproducibility and trustworthiness.3
The FAIR data principles for scientific data management and stewardship provide guidelines aimed at improving data practice and promoting the quality and reusability of data.4 Briefly, FAIR principles state that data should be findable (eg, data have a unique identifier and are registered in a searchable resource), accessible (eg, data are retrievable by their identifier using an open, free, standardized protocol), interoperable (eg, data use a formal, accessible, shared, and broadly applicable language and include qualified references to other data), and reusable (eg, data are described with accurate and relevant attributes, released with a data usage license, and meet domain-relevant community standards).4 These principles inform options for data generation, tool development, object (re)use, and long-term stewardship, effectively enabling researchers to advance their digital ecosystems and promote discovery, innovation, and post-publication integration.4,5
The FAIR principles are particularly well suited for education given their emphasis on improving the rigor and reuse of data to advance knowledge and the increasing ability of computers to enable access to and analysis of data. In health professions education, a small yet growing body of literature advocates for the improvement of our data practices and uptake of FAIR principles.6 ⇓⇓⇓-10 Schwartz and colleagues, for example, argue for digital repositories in medical education to promote replicability of studies, enable secondary data analysis to answer new scientific questions, and overcome the limitations associated with small sample single-site studies.9 To that end, utilizing FAIR principles might also empower experiential educators to overcome the statistical challenges associated with small sample sizes by enabling them to aggregate data across multiple practice sites or institutions.
Further, some health professions education journals now encourage or require data sharing, data citations, and/or data statements.11,12 Medical Education, for example, encourages authors to archive study data and other artifacts in a public repository and asks them to include a data accessibility statement with the link to the repository.11 BMC Medical Education requires data availability statements that describe where data supporting the results can be found, noting that this should include, “the minimal dataset that would be necessary to interpret, replicate and build upon the findings reported in the article.”12
Utilizing FAIR data principles can enable reproducibility and replication of research, facilitate secondary data analyses that answer new research questions (eg, additional analyses, meta-analysis), address increasing requirements from funding sources and publishers advance science, and promote scholar reputation through citations and visibility.6 ⇓-8,10 It is both timely and critical for pharmacy education to adopt data strategies aimed at promoting the quality and trustworthiness of our educational practice and research. To that end, we offer the following recommendations for moving toward the adoption of the FAIR guiding principles in pharmacy education.
Recommendations
We must conduct FAIR needs assessments. Assessing data practices within pharmacy education will elucidate current practices, needs, infrastructure, and barriers, and enable educators to identify gaps between current and FAIR data practices. This would provide a comprehensive understanding of the current state of data practices, including the requirements and infrastructures provided by journals, funding agencies, data repositories, and other institutions; the met and unmet data needs of key stakeholders of FAIR data practices; and the real and perceived barriers to FAIR data practices for various stakeholder groups. As a starting point, this type of needs assessment could be conducted for an individual research team/laboratory, an entire pharmacy school, or across a specific pharmacy education sub-discipline (eg, OSCE data). Ultimately, this deep understanding of the current state of data practices and related gaps will provide the foundation for an evidence-informed roadmap to FAIR data practices for the scholarly community and help craft specific recommendations for educational and research endeavors tailored to individuals and institutions.
We must train educators and researchers in FAIR data principles. Educators and researchers must be equipped with the knowledge and skills needed to teach and apply FAIR data principles. As noted by Mons and colleagues, “FAIR compliant data stewardship will require many different skills that are not traditionally covered by the research curricula of contemporary students and researchers. Therefore, extensive training capacity and training materials are needed, and in need of development.”5 Furthermore, FAIR practices rely heavily on humans to format and communicate data, tools, and findings in a way that enables understanding and mitigates ambiguity associated with natural language barriers (eg, jargon). Although some researchers believe access to shared data will advance research, training will need to address strategies for overcoming identified FAIR barriers, such as: unfamiliarity with data sharing; strategies for handling the effort, time, and cost of preparing data to share; uncertainty regarding benefits to oneself or others; and ethical and legal issues associated with FAIR practices.9,13
We must build FAIR communities of practice in pharmacy. Since the FAIR principles were first described in 2016,4 an estimated 1,000 communities of practice have implemented them across various disciplines.5 These communities have explored ontology mapping, machine learning, automation, and annotation and curation, among other FAIR issues.11 Within pharmacy education, several strategies could be used to identify and connect with those interested in building communities aimed at advancing these efforts. Professional organizations, such as the American Association of Colleges of Pharmacy (AACP) and Research Data Alliance (RDA), often include special interest groups (SIGs) with shared interests related to assessment, data, and use of FAIR principles (eg, AACP Assessment SIG, RDA Raising FAIRness in Health Data and Health Research Performing Organisations Group, RDA Health Data Interest Group). As an increasing number of funding agencies adopt and advocate for improved data practices, such as the National Science Foundation’s Harnessing the Data Revolution,14 individuals may also find potential collaborators among lists of funded projects.
Pharmacy educators might also find individuals and/or committees within their departments, schools, or universities focused on relevant data issues, such as data governance, data science, and information science. Because student pharmacists are taught by experts from various disciplines, such as biomedical science, education, library science, and healthcare practice, it is natural and necessary to create interprofessional FAIR communities that embody relevant expertise. Together, pharmacy educators and researchers must leverage this diversity of expertise to identify and engage collaborators who can identify and implement FAIR strategies.
We must integrate FAIR practices into educational practice and research. Studies of first-generation implementation of the FAIR data principles revealed disciplinary differences, including legal and workflow issues.15 The use of FAIR data principles in education could include navigating legal restrictions associated with accessing and sharing data (eg, Family Educational Rights and Privacy Act), adhering to ethical principles (eg, deidentifying data), and consistently operationalizing commonly used constructs. For example, a single construct, such as wellness, may be measured differently by different institutions, challenging the interoperable nature of the data.
The strategies and related effort required to implement FAIR principles can vary widely depending on the goals of the individual or institution, and the complexity of the data. Some strategies may be achieved with minimal additional effort, such as making a full survey instrument and related de-identified data available. Others might require more effort, such as publishing qualitative transcripts, given the unique processing required to ensure confidentiality of participants.16 Even more complex would be data subject to FERPA restrictions, in which case data instruments may be easily shareable yet the data itself could not be shared. As noted by BMC Medical Education, “it is not always possible to share research data publicly, for instance when individual privacy could be compromised.”12
As a discipline, we must elucidate how FAIR principles intersect with education and how to best position pharmacy educators to integrate them into their research and practice. A survey of Society for Directors of Research in Medicine Education members, for example, indicated that data sharing was strongly supported, yet few reported actually using data repositories.9 Clearly, the challenges facing implementation of the FAIR principles in education are not trivial and will likely require sustained effort and strategy across a wide range of expertise.
We must leverage machine learning in pharmacy. Data-intensive research and decision-making rely on the ability of humans to discover, access, integrate, and analyze task-appropriate scientific data.4 As the amount and diversity of data related to pharmacy education (such as aforementioned data on student selection, performance, and job placement) grows and becomes available in FAIR-compliant datasets, their manual, “eyeballing” analysis of such data becomes increasingly unattainable. Thus, computational tools such as machine learning are growing in use for education data analysis and trend elucidation.17
Machine learning relates to a large group of computational and statistical algorithms and respective software tools that can explore the relationships and correlations between various characteristics of digital objects. For instance, machine learning models can help predict student performance in pharmacy curricula based on student admission data.18 Data that are compliant with FAIR principles are a “lame substrate” without workflows that empower the formatting, communication, and consumption of that data.5 Thus, we should continue to leverage machine learning tools to make the most efficient use of education data to improve student selection, advance their education, and help them find the most satisfying employment. Achieving these goals is, arguably, a common objective of all educational institutions, and the use of bigger datasets typically improves the accuracy and reliability of machine learning models. Thus, promotion of FAIR principles of data use in pharmacy education will enable data standardization and integration across multiple schools and, consequently, growth of respective educational databases. We surmise that these advances will catalyze data-driven decision support and improve the efficiency of pharmacy education.
CONCLUSION
Current data practices limit our ability to leverage the vast amount of data across pharmacy education. Adoption of data strategies that promote quality and trustworthiness in pharmacy education research is timely and critical. Given the complexity of data, data systems, and computations tools across pharmacy education, efforts to improve data practices must be intentional, strategic, and pursued with sustained commitment and multidisciplinary expertise. As first steps, pharmacy educators and researchers must conduct needs assessments, invest in FAIR training, build communities of practice, integrate FAIR principles, and pursue advanced data-analytical machine learning approaches that can inform and improve the training of healthcare practitioners. These efforts will help to ensure that data are utilized with accuracy and efficiency across the continuum of education.
- Received March 30, 2021.
- Accepted June 15, 2021.
- © 2022 American Association of Colleges of Pharmacy