Abstract
Objective. To determine the feasibility of using a validated set of assessment rubrics to assess students’ critical-thinking and problem-solving abilities across a doctor of pharmacy (PharmD) curriculum.
Methods. Trained faculty assessors used validated rubrics to assess student work samples for critical-thinking and problem-solving abilities. Assessment scores were collected and analyzed to determine student achievement of these 2 ability outcomes across the curriculum. Feasibility of the process was evaluated in terms of time and resources used.
Results. One hundred sixty-one samples were assessed for critical thinking, and 159 samples were assessed for problem-solving. Rubric scoring allowed assessors to evaluate four 5- to 7-page work samples per hour. The analysis indicated that overall critical-thinking scores improved over the curriculum. Although low yield for problem-solving samples precluded meaningful data analysis, it was informative for identifying potentially needed curricular improvements.
Conclusions. Use of assessment rubrics for program ability outcomes was deemed authentic and feasible. Problem-solving was identified as a curricular area that may need improving. This assessment method has great potential to inform continuous quality improvement of a PharmD program.
INTRODUCTION
While Accreditation Council for Pharmacy Education standards and guidelines stipulate the need for a comprehensive assessment plan that includes information about the attainment of student learning outcomes,1 few of the many assessment efforts reported in the pharmacy literature track progressive student attainment of ability outcomes over the entire doctor of pharmacy (PharmD) program. In a 2006 survey of programmatic curricular outcomes in US colleges and schools of pharmacy, 18% of the responding institutions reported having a completed assessment program in place, while 57% reported only partial assessment development. Further, among the colleges and schools with established assessment programs, a wide array of assessment instruments were reported, including standardized tests such as the North American Pharmacist Licensure Examination (85% of responding schools and colleges) and Multistate Jurisprudence Examinations (52% of responding colleges and schools); alumni, employer, and student survey instruments (most commonly, end-of-course summative evaluations); and course examinations and grades.2 While these assessments can provide valuable information about an academic program, they provide no direct evidence of student growth and achievement of ability outcomes across an entire curriculum.
One potentially useful set of learning outcomes to measure educational progress has been developed by the Association of American Colleges & Universities (AAC&U). In September 2009, AAC&U published the Valid Assessment of Learning in Undergraduate Education (VALUE) project, which produced a validated set of assessment rubrics based on the Liberal Education and America’s Promise (LEAP) outcomes.3,4 LEAP outcomes, such as critical thinking, problem solving, communication, and ethical reasoning are widely and highly regarded by both the academy and employers as the keys to economic opportunity and democratic citizenship.5 Likewise, participants at the American Association of Colleges of Pharmacy (AACP) 2009 Curricular Change Summit determined that critical thinking and problem solving were among the most essential abilities of an pharmacists entering the profession.6 As described by the AACP Commission to Implement Change in Pharmaceutical Education and recommended by the 2009-2010 AACP Academic Affairs Standing Committee, professionals in particular need critical-thinking and problem-solving abilities to acquire, evaluate, and synthesize information to make informed decisions.7,8
Despite the desire to develop critical-thinking and problem-solving abilities of college students, a crucial finding in AAC&U’s 2005 Preliminary Report on Student Achievement in College was that liberal education outcomes are not sufficiently addressed in higher education by reliable, cumulative assessments of students’ gains from their college studies. Persuasive evidence regarding how well college students are actually doing in terms of critical thinking, problem solving, and other liberal education outcomes is lacking.5
Subsequently, the VALUE project was developed to promote a national dialogue on assessment of liberal education outcomes among colleges and universities. VALUE places emphasis on authentic assessment of student work and shared understanding of student learning outcomes rather than reliance on standardized tests administered to samples of students outside of their required courses. The guiding philosophies behind VALUE include the following: (1) valid assessment data are needed to guide planning, teaching, and improved learning to achieve high-quality education; (2) learning and achievement of abilities develop over time and should grow in complexity and sophistication as students move through their educational pathway toward degree attainment; and (3) good practice in outcomes assessment requires multiple assessments over time.4 Working with teams of faculty experts representing colleges and universities across the United States, AAC&U developed a set of validated assessment rubrics through a process that examined many existing campus rubrics and related documents for each learning outcome. The rubrics articulate fundamental criteria for each learning outcome, with performance descriptors demonstrating progressively more sophisticated levels of attainment from a benchmark level to a capstone level.
Though several higher-learning institutions in the United States have begun using VALUE rubrics for programmatic assessment,9,10 a search of the pharmacy education literature reveals that PharmD programs have not. Recognizing this unique opportunity, the need for programmatic assessment, and the validity of VALUE rubrics for assessing program-level ability outcomes, faculty members at St. Louis College of Pharmacy (STLCOP) initiated efforts to develop a set of program-level ability outcomes (14 general based on the LEAP outcomes3 and 3 professional based on the 2004 Center for the Advancement of Pharmaceutical Education outcomes11) and to delineate an authentic methodological approach to their assessment throughout the curriculum. The aim was to test the feasibility of using the VALUE rubrics as a means of program assessment, selecting 2 ability outcomes — critical thinking and problem-solving — to assess across a 6-year PharmD curriculum, comprised of 2 pre-years and 4 years. This paper describes the inaugural effort with respect to establishing a baseline assessment, elaborating assessment methodologies, and establishing methods for data management and analysis, with experiences illustrated using the critical-thinking and problem-solving ability outcomes.
METHODS
The project was approved by the STLCOP Institutional Review Board. Commencing with the 2010-2011 academic year and led by the Curriculum and Curricular Assessment Committee (CCAC), faculty members began working with the VALUE rubrics for 2 ability outcomes prioritized for initial exploration: critical thinking and problem solving. Over a series of facilitated workshops in the fall of 2010, faculty members gained an understanding of the VALUE rubrics for critical thinking and problem solving by using the rubrics to assess samples of student work taken from various courses. Based on these initial exercises, minor wording modifications were made to the VALUE rubrics to better align with the needs of the academic program at STLCOP. Because the VALUE rubrics have been previously validated, only minor wording modifications were made to better define terminology in the rubrics and to allow for improved applicability to a PharmD program (Appendix 1, Appendix 2).
In January 2011, faculty members teaching required courses were asked to review the modified VALUE rubrics to determine if any student assignments or practice opportunities in their courses addressed either or both critical thinking or problem solving. To achieve a baseline assessment of the curriculum, faculty members did not proactively design any practice opportunities or assignments in their courses to specifically address the VALUE rubrics. Instead, they were asked to submit assignments they thought addressed either of these ability outcomes. From February 2011 through April 2011, student work samples from the 2010-2011 academic year were solicited from faculty members on a voluntary basis. Faculty members were asked to select 1 assignment from their respective courses, specify which ability outcome the assignment addressed, remove all student- and course-identifying information from the student work samples, and submit all samples to the primary investigator and the CCAC. Because this was the first experience with submitting student work samples, all samples were initially accepted, regardless of the type or length of assignment. However, because of the volume of work samples received initially, the primary investigator worked with the CCAC to determine: (1) selection criteria for work sample inclusion, namely feasibility of assessing the work sample (eg, page length of each work sample, number of faculty assessors needed to complete assessments, and the amount of assessment time allotted for each program assessment day), and (2) whether the work sample seemed to adequately address the selected ability outcomes. From the initial pool, a subset of work samples was selected using a random numbers table. Selected samples were slated for assessment at the academic year-end program assessment days in May 2011.
Student work samples reflected both preprofessional and professional curricula and included assignments from required courses only. Samples were coded using an alpha-numeric coding system so that only the researchers knew the ability outcome being assessed and the course identification. In an attempt to limit bias, faculty assessors were blinded to the course identification.
In May 2011, volunteer faculty assessors representing a wide array of disciplines participated in 2 days of assessment and were provided training on using the VALUE rubrics for assessing student work samples. Faculty assessors participated in a calibration process in which scoring with the VALUE rubrics was reviewed and several test student work samples were assessed and discussed in order to resolve assessment discrepancies and reduce inter-assessor variability. Following the calibration process, each de-identified student work sample was then evaluated independently by at least 2 faculty assessors. Student work samples were assigned to the assessors in a quasi-random manner (ie, upon completing the assessment of a student work sample, the assessor then selected the next available student work sample).
Because work samples were assessed and not graded for a correct or incorrect answer, no stipulations were placed on assessors in terms of which work samples they could assess. For example, assessors were not limited to assessing samples only from their respective disciplines. Using the VALUE rubrics, faculty assessors evaluated each work sample by assigning a score for each criterion included in the rubric. Each criterion was assessed on a 4-point scale, on which 1 represented a benchmark (or introductory level) performance and 4 represented a capstone (or mastery level) performance. Faculty assessors were also permitted to give a score of zero for student work samples that did not achieve the benchmark level or a score of “NA” (not applicable) for student work samples that did not address the rubric criterion at all.
Each student work sample was assessed independently by at least 2 faculty assessors, and the assessment scores were compared. When assessment scores differed by more than 1 point between the 2 assessors on any rubric criterion, a third faculty assessor made an independent assessment, and that assessment was compared with the first 2 assessments. Of the 3 assessments, the 2 that most closely concurred were used. Typically, the third assessment provided for agreement on the assessment of the work sample. However, if all 3 assessments differed, either a fourth assessor completed an independent assessment of the work sample and agreement was sought, or the assessment-session facilitator worked with the assessors to seek consensus on the assessment of the student work sample.
At the conclusion of each assessment day, the faculty assessors engaged in a guided debriefing discussion of their reflections on the validity, reliability, and feasibility of the process and the value of the assessment findings. Assessors discussed additional minor wording revisions to be made to the rubrics, the types of student work samples that were most feasible and usable for assessing with the rubrics, and ways to improve the overall assessment process to make it more efficient and effective.
Assessment scores for the student work samples were recorded in a Microsoft Excel spreadsheet. For work samples that received 2 assessments, the scores for each criterion of the rubric were averaged for each work sample. If there were 3 assessments of the work sample, the average of the 2 closest scores (ie, the 2 scores that were in agreement) was calculated. If there was only 1 score for a criterion (ie, if 1 of the scores was NA), the student work sample record was considered incomplete and thus was removed from the data analysis. Therefore, the analysis reported here contains only student work sample records that provided complete assessment information for all criteria in the VALUE rubric. This method of quantifying the data resulted in the analysis of student work samples with a score for each criterion in the VALUE rubric so that an aggregate score could be calculated, reflecting the overall extent of mastery of either critical thinking or problem solving based on the VALUE rubric.
The following approaches to data analysis were conducted. First, yield was defined as the percent of student work samples that provided complete ability outcome scores. Yield was further analyzed in terms of the proportion of work samples requiring more than 2 assessments. Then, an analysis of variance model with a Bonferroni post-hoc test was used to characterize differences in overall ability outcome scores as a function of year of study in the academic program. Third, differences among the rubric criteria scores were analyzed using an analysis of variance model. Bonferroni post-hoc tests were used to identify changes in criterion scores across the curriculum. Finally, changes in frequency distribution of the ability outcome and criterion scores were compared between the second and sixth years of the academic program using the Wilcoxon rank-sum test (ie, the Mann-Whitney U test). All statistical tests were conducted at an alpha level of 0.05.
RESULTS
Of the 159 student work samples submitted for assessment of problem-solving, only 5 (3%) records achieved complete scores and qualified for inclusion in the analysis. The last 2 criteria on the problem-solving VALUE rubric, implement solution and evaluate outcomes, were often scored as NA, and were the most common reason for an incomplete score.
Of the 161 work samples submitted for assessment for critical thinking, 69 (43%) were adjudicated by 2 assessors, and 42 (26%) were referred to a third assessor, for an overall yield of 69% (Table 1). The yield varied widely for the different years of the academic program, with the lowest yield being recorded in the third year (35%) and the highest (100%) in both the second and the sixth years.
Pharmacy Student Work Samples Reviewed for the Critical-Thinking Ability Outcome
Table 1 shows the average overall ability outcome scores and standard deviations across the 6-year PharmD program. The maximum critical-thinking score was evident in the sixth year (ie, year 4) and the minimum in the third year (ie, year 1). Given that the data point for the third year was based on 8 work samples, it should be interpreted with caution. The ability outcome score of students in the sixth year exceeded those in all other years (p<0.05) except the fifth year. Further, the scores of the sixth-year students displayed the least variability, as evidenced by a coefficient of variation (standard deviation/mean*100) of 13.9%, compared with the average coefficient of variation of 21.9% for all years of study. None of the overall critical-thinking scores approached the maximum capstone score of 4.0 at any year in the academic program.
Consistent with the data above, the highest criteria scores in each category were achieved by sixth-year students and the lowest by the third-year students. Although the 2 rubric criteria of explanation of issues and evidence were scored significantly higher than were the rubric criteria of context, position and conclusion (p<0.05), the absolute values of the scores hovered around the middle value of 2.0, failing to approach the maximum score attainable, 4.0.
The frequency distribution of the overall scores for the critical-thinking ability outcome is presented in Figure 1 for students in the second and sixth years of the curriculum. With second-year students, about 50% of the samples were scored in the middle range (with a modal score of 2), about 16% were below the middle value, and 34% were above the middle value. In contrast, almost 60% of the work samples of sixth-year students exceeded the average score of 2 (p<0.05), displaying a mode of 3.0. The superior scores of the sixth-year students were accompanied by higher scores for all rubric criteria with the exception of influence of context and assumptions (Figure 2).
Distribution of Critical-Thinking Ability Outcome Scores for Students in the Second and Sixth Year of the Academic Program. Rubric scores: 1=benchmark performance, 2=milestone performance, 3=milestone performance, 4=capstone performance.
Average Rubric Criteria Scores for the Critical-Thinking Ability Outcome for Students in the Second and Sixth Years of the Academic Program. Rubric scores: 1=benchmark performance, 2=milestone performance, 3=milestone performance, 4=capstone performance. *p<0.05 for the comparison between sixth-year and second-year students
Feasibility was assessed in terms of time and resources used. The primary investigator was largely responsible for planning the assessment days and collecting student work samples. This planning work required approximately 120 hours. Workload varied among faculty volunteers who collected and de-identified student work samples. Many work samples were stored and readily retrievable from the institution’s online course-management system while other work samples, which were available only as hard copies, had to be retrieved from students and photocopied. On the assessment days after the training session, it took 17 faculty assessors approximately 4 hours to assess 159 student problem-solving work samples, with each sample being assessed at least twice and a third assessment performed if there was no concordance between the first 2 assessments. For critical thinking, 20 faculty assessors spent approximately 4 hours assessing 161 student work samples. On average, a faculty assessor could assess four 5- to 7-page work samples per hour.
After assessment, faculty members discussed the following as part of a guided debriefing: (1) how their conception of the abilities had evolved during the process, if at all; (2) the ways in which learning activities might be strengthened in order to improve student growth in problem-solving and critical thinking; and (3) how assessments of ability outcomes might be enhanced to better capture evidence for the process components of students’ abilities.
DISCUSSION
Several higher education institutions have used VALUE rubrics to assess the academic progress of their students. For example, after reviewing more than 300 work samples from introductory courses, faculty at the University of North Carolina in Wilmington, concluded that students’ critical thinking at both the general education and major-concentration levels was in greatest need of improvement.9 At the University of Delaware, an improvement in critical-thinking scores was reported from the freshman to the senior year of study. Faculty members noted that obtaining authentic samples of students’ coursework caused minimal disruption of class time, and the use of the VALUE rubric proved helpful in isolating specific areas of weakness in students’ critical-thinking abilities, such as making assumptions and drawing conclusions.10
Prompted by these successes, we initiated a process to adapt AAC&U learning outcomes and VALUE rubrics to the education of pharmacy students. The first steps, recruiting and training faculty members and pilot testing the VALUE rubrics, were time-consuming but yielded several benefits essential to the success of the project. Through the calibration exercise, faculty members reported gaining a common perspective of the VALUE rubrics and acquiring a common language and shared understanding of problem solving and critical thinking. This promoted agreement among faculty assessors, thus increasing the validity of the assessment measures. However, an area for improvement noted by assessors was the lack of diversity in student work samples tested during the calibration process. Having a more diverse set of work samples discussed during calibration was considered one way to give assessors a better understanding of when to use NA vs “0” ratings in their assessments. A more robust understanding of this difference has the potential to increase the yield of complete assessment records on future assessment days.
Another essential activity was the debriefing session held at the end of each assessment day. This session allowed assessors not only to evaluate the assessment process and need for further VALUE rubric or student assignment revisions, but also to engage in a broader discussion of assessment as a whole at the institution. This discussion was also shared with the full faculty membership in hopes that all would use the assessment data and insights gained to improve critical-thinking and problem-solving opportunities in their respective courses. This was the first time faculty members assessed ability outcomes across the curriculum in a comprehensive and systematic way. We believe this method of program assessment helped contribute to our developing culture of assessment at STLCOP by promoting dialogue among faculty members within and across disciplines, and allowing faculty members to more clearly appreciate how their courses and assignments can contribute to students’ progress toward achievement of ability outcomes throughout the entire academic program.
Results of this process for the assessment of ability outcomes were mixed, in that critical-thinking ability was more successfully evaluated than was problem-solving ability. One reason for this difference was the paucity of records providing complete assessment information for problem-solving. Two of the problem-solving rubric criteria, implements solution and evaluates outcome, were often marked as “NA” by faculty assessors, suggesting that assignments and practice opportunities need to be enhanced to provide students with more opportunities to demonstrate these aspects of problem-solving. Additionally, given that the problem-solving VALUE rubric is primarily designed to assess a student’s problem-solving process, it actually best fits open-ended problems with more than 1 viable solution. Thus, work samples in which only the student’s final answer was requested or provided did not supply enough evidence for the student’s thought process, rendering it impossible to assess all elements of the rubric. Asking students to report their thought process on the work sample would likely increase the yield of analyzable data for problem-solving ability as well as potentially increasing students’ own metacognition. While this now seems to be a self-evident fact, it was an important insight for the assessment day participants to realize how often process was being inferred from the correctness of the answer rather than being explicitly stated by students in their responses. Helping students more overtly examine and express their thinking processes can also be tied to gains in the development of self-regulated learning, in which students are capable of self-assessing what they know, what is still confusing to them, and the gains they have made over the course of their studies.12
The results of the assessment of critical thinking, in contrast, provided evidence that the outcome was practiced in each academic year of our curriculum, and performance levels increased as students progressed from matriculation to graduation. This latter finding is similar to that reported among undergraduates at the University of Delaware.10 Further, the scores for each criterion of the critical-thinking VALUE rubric, with the exception of the rubric criterion related to the student's position (it, perspective, thesis/hypothesis), increased across the PharmD program. The student’s position is thus an area to further develop through improved course assignments and practice opportunities. Ratings of the critical-thinking ability outcome also became less variable and more consistent in latter years of the curriculum. Frequency data indicated that this improvement in scores included all students and not just a small segment whose large gains drove the mean performance of the group.
With both ability outcomes, there was little evidence of students achieving the capstone level. While this might be attributable to several factors, a major consideration is that the work samples were not designed a priori to reflect the rubrics of the ability outcome. Additionally, expectations of performance levels at different stages of the curriculum had not been delineated. However, having gained a notion of performance levels in the current curriculum, we will ensure that future teaching and learning activities will concentrate on honing these abilities and future assessments on demonstrating progressive development toward mastery.
In addition to these specific findings, several more general benefits were derived from these assessment activities. First, pilot testing use of the VALUE rubrics as a means of program assessment helped faculty members at STLCOP: (1) assess student progress on critical-thinking and problem-solving ability outcomes across the curriculum and determine how to improve the assessment process; (2) identify strengths and areas for improvement in our curriculum specific to developing students’ problem-solving and critical-thinking abilities; and (3) foster an institutional culture of assessment. Despite a long-standing history of ability-based education at our institution, a shared understanding of ability outcomes and how to assess them has been lacking among faculty members. As a result of participating in this feasibility study, faculty members reported gaining insight into how the VALUE rubrics can be used to provide evidence of continuous quality improvement of the academic program.
Second, debriefing at the conclusion of each assessment day allowed faculty assessors to review the experience to identify strengths, areas for improvement, and insights for this means of program assessment. Areas of strength identified included the feasibility of conducting this method of program assessment. Selecting only 2 ability outcomes to assess per year was manageable. The time and workload commitment was worthwhile because faculty members gained a shared understanding of critical-thinking and problem-solving abilities. Additionally, participating in the assessment process either as an assessor or a submitter of student work samples subsequently prompted faculty members to re-examine their teaching and the practice opportunities available for students to hone their abilities. Assessment data collected were easily distributed back to faculty members who submitted student work samples, and analysis of the overall assessment data was presented to the entire faculty membership for discussion as part of a regular faculty meeting.
Third, there was a realization that continued improvements in the assessment process would require providing faculty members with additional forums to discuss and act on the assessment data and to share personal experiences regarding how the assessment data were used to improve course assignments. Doing so could especially enhance understanding of this method of program assessment. Another needed improvement that emerged was related to the processes for collecting student work samples. Faculty members could minimize the time needed to collect student work samples by proactively collecting and de-identifying samples of student work throughout the academic year. Likewise, faculty assessors participating in this inaugural assessment process gained insight into the types of student work samples that are most suitable for assessing with the rubrics, which could make for more effective use of future assessment days. For example, essays that were 2 to 4 pages in length tended to work well for assessing critical thinking. The experience gained from this initial feasibility test will improve the collection of usable work samples for subsequent program assessment days. An additional improvement noted for future iterations of the assessment process was to reduce the probability of faculty assessor fatigue by scheduling multiple assessment days non-consecutively.
Finally, faculty members gained an appreciation of the difference of using a VALUE rubric for its intended purpose as an assessment instrument rather than as a grading instrument. When assessing a student work sample using a VALUE rubric, an assessor was merely looking for evidence in the student work sample that a criterion description in the rubric had been demonstrated rather than grading the accuracy of the student’s answer or determining how correctly a particular assignment format was followed. Moving into assessment mode and away from the more familiar habit of grading was made somewhat more difficult if the faculty assessor had expertise in the topic area focus of the student work sample.
Despite the benefits obtained, there are limitations to interpreting the significance of our collected assessment data for problem solving and critical thinking across the curriculum. One major issue is the lack of a control group. Changes seen in critical-thinking abilities over time could have been a result of other factors, such as student maturation or activities occurring outside the classroom. Also, students may have developed critical-thinking and problem-solving abilities as the result of courses, homework assignments, and practice opportunities that were not accounted for in the pool of collected work samples. Thus, in subsequent years, faculty members may need to consider collecting student work samples from a broader range of courses, such as electives, to demonstrate additional contexts and potentially provide a more complete picture of students’ critical-thinking and problem-solving abilities across the academic program.
CONCLUSION
An assessment approach using the VALUE rubric was effective in identifying/documenting student growth and achievement of critical-thinking and problem-solving abilities across a PharmD program. This approach enabled faculty members to evaluate aggregate achievement of these abilities over the academic program and also to isolate specific areas of strength and weakness in students’ abilities by looking at assessments for each individual criterion on the rubrics. The analysis indicated that although overall critical-thinking scores improved over the curriculum, students never reached capstone levels of performance. While a low yield for problem-solving samples precluded meaningful data analysis, the assessment process was informative for indicating areas for potential curricular improvement.
Faculty members considered use of the VALUE rubrics as a means of assessing program ability outcomes to be a worthwhile, authentic, and feasible means of program assessment to drive continuous quality improvement of the academic program. Given the success of this feasibility study, this method of program assessment will be conducted annually at STLCOP, selecting 2 program ability outcomes per year, which will provide a promising line of further research to systematically explore development of the college’s complete set of program ability outcomes.
Appendix
Critical-Thinking Rubric based on VALUE Rubrics From the Association of American Colleges & Universities
Critical Thinking Rubric
based on the VALUE Rubrics from the Association of American Colleges & Universities
Definition: Critical thinking is a habit of mind characterized by the comprehensive exploration of issues, ideas, artifacts, and events before accepting or formulating an opinion or conclusion.
Evaluators are encouraged to assign a zero to any work sample or collection of work that does not meet benchmark (cell one) level performance or an NA if the criterion is not applicable to the sample.
Appendix
Problem-Solving Rubric based on VALUE Rubrics From the Association of American Colleges & Universities
Problem-Solving Rubric
based on the VALUE Rubrics from the Association of American Colleges & Universities
Definition: Problem-solving is the process of designing, evaluating, and implementing a strategy to answer an open-ended question or achieve a desired goal.
Evaluators are encouraged to assign a zero to any work sample or collection of work that does not meet benchmark (cell one) level performance or an NA if the criterion is not applicable to the sample.
- Received March 1, 2013.
- Accepted April 17, 2013.
- © 2013 American Association of Colleges of Pharmacy