## Abstract

**Objective.** To investigate the effects of multicourse, composite examinations on student performance in a pharmacokinetics course.

**Methods.** A linear, mixed-effects model was used to analyze student performance in identical daily quiz and examination questions in a pharmacokinetics course at two pharmacy schools. The same instructor taught the entire course at both institutions. The only difference between the two courses was the method of administration of examinations between the two school cohorts.

**Results.** Students’ scores on identical daily quizzes that were administered similarly to students in both schools were the same. However, student grades on multicourse examinations were significantly lower than those administered as individual course examinations in the other school group. The effect size was 1.15, indicating a large difference between the two cohorts in terms of their examination scores. The mixed-effects model revealed a negligible difference (0.622%) between the two student cohorts in terms of their academic abilities but showed a substantial effect (9.40%) for the examination format in favor of single course assessment.

**Conclusion.** When compared to traditional, individual course examination, multicourse, composite examinations may significantly lower student grades in a pharmacokinetics course.

- composite examination
- multicourse examination
- integrated examination
- student performance
- basic pharmacokinetics

## INTRODUCTION

Most pharmacy schools in the United States assess student performance in different courses using separate, course-specific examinations during each semester. In addition to within semester examinations, many schools use some form of cumulative, progress assessments annually or prior to the start of experiential education to demonstrate that students retain the required foundational knowledge and skills and are prepared for advanced pharmacy practice experiences (APPE).^{1-3} The Accreditation Council for Pharmacy Education (ACPE) Standards 2016 require that all US pharmacy schools administer the Pharmacy Curriculum Outcomes Assessment (PCOA) at the conclusion of the didactic curriculum. Developed by the National Association of Boards of Pharmacy, PCOA is a nationally standardized examination that provides an assessment of student performance in foundational knowledge in various domains of biomedical, pharmaceutical, social/administrative/ behavioral, and clinical sciences.^{4}

Traditionally, course-specific assessments include administration of one or more mid-term (or within term) examinations plus a final examination for each course throughout and at the end of each semester, respectively. However, the effectiveness of this method of assessment with regard to content retention and integration of materials across different courses has been questioned. Efforts have been made to improve the assessment methods used in pharmacy schools. For example, Medina and colleagues incorporated a biannual integrated examination during the first three years of the pharmacy curriculum.^{5} The integrated examinations were administered twice a year during the final examination periods and consisted of questions from all the required courses during that semester. The integrated examinations were embedded in the final examination of the pharmacy practice course series offered in each semester and accounted for 10% of a student’s final course grade. These authors concluded that the integrated examinations improved the culture of assessment and faculty’s understanding of the curriculum.^{5} However, further studies are needed to determine whether such integrated examinations would affect student learning outcomes, performance, or retention of content.

To mitigate the scheduling challenges associated with multiple examinations for several courses during the semester and the apparent limitations of traditional course-specific assessments, assessments at our institution were based on simultaneous, multicourse examinations at regular intervals. Under this plan, a composite examination (CE), consisting of questions from all the courses in that trimester, would be administered every 2-3 weeks, resulting in five examinations for the entire trimester. There are theoretical and conceptual arguments in favor of CE versus individual course examinations, including ease of scheduling, accommodation for more frequent testing, improved study habits, increased content retention, and similarity to board examinations. However, to date, there are no studies that assess potential differences between the two methods in learning and performance of students in individual courses. Other than McDonough and colleagues’ article that reported on student attitudes and perceptions toward a CE implemented at the University of Tennessee Health Science Center College of Pharmacy, the literature on this subject is scant.^{6} The hypothesis of this study was that the examination format (individual versus multicourse) would not affect the performance of students in learning basic pharmacokinetics principles.

## METHODS

Basic Pharmacokinetics was offered as a three-credit hour course to students in two pharmacy schools, one with individual course examinations (Texas Tech; 2013-2014 academic year) and the other with CE (Chapman; 2015-2016 academic year). At both schools, the entire course was taught by the same instructor. The format of the course is based on the principles of active learning and substantial engagement of students before, during, and after the class sessions.^{7} The course contents, sequence of topics, total number of lectures per course, number of lectures per each topic, length of class sessions (75 minutes), and course resources (eg, online tools for assignments, quizzes and examinations, and simulations) were identical for both classes.^{8-10} Additionally, daily quizzes were administered in an identical manner to both cohorts at the end of each class session. However, the method of administration of the five examinations was different between the two cohorts. At Texas Tech, students received traditional, single course examinations throughout the semester on days where no other major examinations were scheduled. At Chapman, there were five examination days throughout the trimester (every 2-3 weeks) when a composite examination, consisting of questions from all the courses during the trimester, would be administered during one block of time.

There were 162 students at Texas Tech and 79 students at Chapman enrolled in the course in 2013-2014 and 2015-2016 academic years, respectively. The quiz and examination questions were based on an online program that creates individualized questions for each student by incorporating some random parameters in a question with identical structure for all students, but with different pharmacokinetic and/or dosing parameters.^{10} For example, a question asking all students to estimate the plasma half-life of a drug after intravenous administration would have different plasma concentration-time courses and, hence, a different half-life value for each student. Additionally, multiple-choice, conceptual questions are drawn from several dynamic scenarios where all the choices are possible depending on the scenario randomly selected for each student.^{10} The questions are drawn from an examination question bank consisting of ∼500 dynamic (individualized) questions. Whereas some of the quiz and examination questions were the same for both Texas Tech and Chapman students, other questions were different. Therefore, for both daily quizzes and examinations, only questions that were the same for both cohorts were identified and used for comparison of performance between the two cohorts. There were 55 quiz questions and 47 examination questions that were the same in both cohorts and were included in the analysis.

To statistically compare the performance of students in quiz and examination questions between Texas Tech and Chapman students, a stepwise, linear, mixed-effects model was used. Mixed-effects model analysis is a preferred method of analysis when there are correlated data due to grouping of subjects (eg, Texas Tech and Chapman students) or repeated measurements on each subject over time (daily quizzes and regular examinations throughout the semester/trimester). This model controls for differences in the academic abilities of students. The model uses both fixed and random effects in the same analysis. The fixed effects in this model were school (*X*_{School}, assigned 0 for Chapman and 1 for Texas Tech), type of test (*X*_{Type}, assigned 0 for quiz and 1 for examination), and the interaction of school and type of test (*X*_{School} × *X*_{Type}, assigned 1 for Texas Tech examination and 0 for others). The random effect was individual students’ intercept shift as a result of their academic differences from the group (*b*_{i}). The full model, incorporating all the fixed and random effects, is shown in this equation:where *Y*_{ij} is the score of the i^{th} student at the j^{th} measurement; *β*_{0} is the population intercept; *β*_{School}, *β*_{Type} and *β*_{School} _{×} _{Type} are the fixed effect coefficients for school, type of test, or school-type of test interaction, respectively; and k refers to the unaccounted error. This method uses individual student performance on each of the 55 quiz and 47 examination questions to create a separate intercept for each individual in order to control for each student’s academic starting point.

For determination of the best model to describe the data, a stepwise forward model selection method was used. The simplest model (containing intercept only without any of the fixed effects) was used as the starting point and was statistically compared with progressively more complex models with the addition of one fixed effect at a time to finally arrive at the full model. Variations of the tested models are listed in Table 1.

Progressively complex models were compared using both likelihood ratios and pairwise comparisons based on Pearson’s chi-square analysis. The likelihood ratio test compares the likelihood of two models, which are different from each other only by the presence or absence of one factor, to predict the observed data. For example, the ability of a model with intercept only (Model 1, Table 1) to predict the observed results is compared with a model that incorporates both intercept and School (Model 2, Table 1), with School being the only differentiating factor. In the next step, the likelihood of accurate prediction of the observed data using a model that incorporates intercept and School (Model 2, Table 1) is compared with a model that incorporates intercept, School, and Test Type (Model 3, Table 1), with the Test Type being the differentiating factor. This process continues until the likelihood of all the possible factors (including School-Test Type interaction) on the predictability of the model is determined. In addition to the likelihood ratio test, Pearson’s chi-square analysis compares the probability (*p* value) of the pairwise comparison between any of the above two models when the effect of addition or removal of only one factor is tested. The chi-square analysis of two models determines whether the addition or removal of a factor significantly affects the ability of the model to predict the observed values. For likelihood ratios, Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC) were used as penalized likelihood criteria, and the model with the lowest AIC or BIC (highest likelihood) was selected. For chi-square analysis a *p* value of <.05 was considered significant.

After the best model fit was obtained based on the above criteria, post-hoc, two-tailed t-tests were also used to compare differences between the following groups using a Bonferroni correction for *p* values to control for type I error: Chapman Quizzes vs Texas Tech Quizzes, Chapman Examinations vs Texas Tech Examinations, Chapman Quizzes vs Chapman Examinations, and Texas Tech Quizzes vs Texas Tech Examinations. In addition to *p* values, the effect sizes were estimated when comparing the groups. Recently, it has been argued in the literature related to education that merely citing a *p* value for determination of significance is not enough, and *p* values should be associated with an appropriate measure of magnitude of the difference between the groups, which is called effect size.^{11} However, absolute effect sizes (absolute differences between two groups) do not account for the variability in the data. Therefore, one of the commonly used effect size calculation methods, called Cohen’s *d* effect size, is estimated by dividing the mean differences between the two groups by the standard deviation of the data. This method transforms the absolute differences into standard deviation units. Here, Cohen’s *d* effect sizes was estimated for comparing the magnitude of the effect for different groups. Analysis of data was performed using the “lmer” function in the “Ime4” package by Bates in the R-Project for statistical computing.^{12}

The study was screened and deemed exempt from formal review by Texas Tech University Health Sciences Center Institutional Review Board (IRB) for the Protection of Human Subject and was also approved by the Chapman University IRB.

## RESULTS

The likelihood of the five models used for description of data and the pairwise comparisons of the progressively more complex models are presented in Tables 2 and 3, respectively. As shown in Table 2, addition of School, Test Type, and their interactions progressively decreased both AIC and BIC values (ie, an increase in likelihood), indicating that the full model (Model 5, Table 1) best describes the data. This conclusion was also confirmed by the pairwise comparison of progressively more complex models, shown in Table 3, as addition of each of the fixed effects to the model significantly improved the model predictability of the observed data. Overall, based on both statistical methods, the full model (Equation 1) was chosen for description of data.

The estimates of intercept and coefficients of the fixed effects and their variabilities (SE and coefficient of variation or CV) for School, Test Type, and School × Test type interaction, based on the full model (Model 5), are shown in Table 4. To provide numeric examples of Equation 1 and demonstrate how these estimates fit the observed data, the predicted grades for quizzes and examinations for Texas Tech and Chapman students were calculated, and are listed below:where *b*_{i} is the individual student’s shift from the intercept, and the numbers 74.7, 74.1, 81.8, and 90.6 are the predicted average grades for Chapman quiz, Texas Tech quiz, Chapman examination, and Texas Tech examination, respectively. These predicted values are identical to the observed averages. The analysis also generated *b*_{i} values for individual students, which are not shown here.

The model-generated interaction plots, demonstrating mean (SE) values of grades, are shown in Figure 1, and the pairwise comparisons of different groups and their associated effect sizes are shown in Table 5. There were no significant differences (*p*=1.00) between the two cohorts of students in terms of their performance in 55 quiz questions (Figure 1 and Table 5); the mean (SE) of quiz scores were 74.1 (0.8) and 74.7 (1.3) for the Texas Tech and Chapman students, respectively. However, the grades of students on the 47 examination questions in the Chapman group (82.0 (1.1)) were significantly (*p*<.0001) lower than those in the Texas Tech group (90.6 (0.5)) (Figure 1 and Table 5). This difference amounted to a Cohen’s *d* effect size of 1.15 (Table 5). Additionally, there were significant differences between quiz and examination grades for both Texas Tech (effect size of 1.92) and Chapman (effect size of 0.704) groups (Table 5).

Figure 2 shows plots of differences between the Texas Tech and Chapman students’ grades in each of the 55 quiz and 47 examination questions. Whereas the differences in the quiz scores were randomly distributed around the line of zero (no statistical difference), a large majority of examination score differences were negative, indicating a significant (*p*<.0001) bias toward lower examination grades for the Chapman students.

## DISCUSSION

Composite examination (CE) is a new assessment method used to test student performance in individual courses using a single test administered at regular intervals, which contains questions from all the courses offered during that semester. Composite examination is different from, and does not replace, progress or milestone assessments, which are normally administered at the end of each semester, academic year, and/or before students start their APPEs.^{1,2} Additionally, CE is different from integrated assessments, where questions integrate different disciplines or courses together.^{5} Although CE mixes questions from several courses, each is specific to a particular course. One potential advantage of CE is that it allows more frequent testing during the semester by reducing the total number of individual examinations necessary for traditional, single course examinations. Research has shown that testing can decrease the normal memory decline and improve retention of newly learned material.^{13,14} Additionally, it has long been established that studying at spaced intervals, as opposed to cramming in one session, improves long-term retention or memory, when the total study time is the same.^{15-17} A recent study showed that although sleep restriction significantly reduced the recall of learned materials after cramming, it did not negatively affect retention of materials learned over spaced intervals.^{17} Therefore, compared with traditional, single course examinations, CE may improve learning and/or increase the retention of content in individual courses by potentially allowing more frequent testing and spaced learning.

Despite theoretical advantages of CE over single course examinations, there is no report in the literature investigating how CE might affect the performance of students in various courses. A 2016 study on student perception regarding the application of CE in a pharmacy school reported that 41%-44% of students thought CE increased knowledge retention, while only 12%-19% of students believed that separate course examinations led to greater content retention.^{6} Pre- and post-CE surveys also revealed a significant decrease in the number of students who described their study habit as cramming (29% before CE and 11% after CE). However, taking a CE examination on several subjects together, as opposed to a single course examination, may negatively affect student performance on questions related to each course, a subject that has not yet been addressed. The results of this study show that CE has a substantial negative effect on student grades in a pharmacokinetics course (Figures 1 and 2 and Table 5). The effect size for the difference between the Chapman (CE) and Texas Tech (individual course examination) students in terms of performance was 1.15. This means the performance of Chapman students was on average 1.15 standard deviations lower than that of Texas Tech students. Cohen suggested that effect sizes of 0.2, 0.5, and 0.8 represent small, medium, and large effects, respectively.^{18} Therefore, an effect size of 1.15 is substantial and can have a significant impact on the overall grades of students in the course, depending on the weight of CE in the overall course grades.

Comparison of groups also showed substantial effect sizes for the differences between the quiz and examination grades (Table 5), with higher grades achieved on examinations for both Texas Tech (effect size of 1.92) and Chapman (0.704) students. This observation is consistent with the fact that quizzes were administered at the end of each class session and were related to the topic presented on the same day. Therefore, quizzes reflect the first exposure of the students to the topic. However, for examinations, students had additional opportunities to study the topics after the class session and before taking the examinations. The much lower quiz-examination effect size of Chapman students (0.704), compared with that of Texas Tech students (1.92)(Table 5) is due to the School x Type of Test interaction detected by the mixed-effects model (Tables 3 and 4), indicating poorer performance of Chapman students in the composite examinations.

An ideal study design to compare CE and course-specific examinations would divide the same student cohort into two subgroups, with identical treatments for both subgroups in terms of instruction but with one subgroup subjected to CE and the other to individual examinations. That type of study, however, is very difficult, if not impossible, to implement both logistically and academically. Instead, this study used two student cohorts at two pharmacy schools who were subjected to an identical instructional method in a pharmacokinetics course taught by the same instructor and identical instructional resources, but with different examination methods. Because two different student cohorts were used, one may argue that the lower performance of students in the CE at Chapman, compared with the performance of Texas Tech students in the individual course examinations, might be due to a difference in the academic abilities of the two student cohorts. However, the data clearly refute this argument because the performances of students in daily quizzes, which were administered in an identical fashion to both cohorts, were nearly identical for the two cohorts (74.1 (0.82) and 74.7 (1.26) for the Texas Tech and Chapman students, respectively). Furthermore, the mixed-effects model is capable of accounting for differences in the academic abilities of students between the two cohorts. This is determined by the School coefficient (*β*_{School}), which was –0.622 (Table 4), indicating that the academic abilities of the Texas Tech students were marginally (0.622%) lower than those of Chapman students. If the differences in the quiz and examination grades were only due to differences in the academic abilities of the students, one would expect parallel slopes for the two cohorts with different starting points, as opposed to a lower slope for the Chapman students observed in Figure 1. Indeed, the significant (*p*<.0001) interaction coefficient (*β*_{School} _{×} _{Type}) of 9.40% detected by the model (Table 4) is an indication of a lower performance of Chapman students in the composite examinations, regardless of possible differences between the two student cohorts in their academic abilities, which in this case was marginal.

The findings of this study need to be interpreted in the context of some limitations. First, this study had a relatively small sample size. Additionally, caution should be exercised when extrapolating the negative effects of CE on the performance of students on pharmacokinetics examination questions to other courses. This is because a major perceived advantage of CE is that by allowing more frequent testing, CE improves study habits, which is expected to improve learning.^{6} However, this was not true for this study because the number of examinations at both institutions was the same for the pharmacokinetics course. It is likely, though, that for most courses, CE would allow for more frequent testing, thus potentially mitigating the negative effects of CE on student performance observed in this study. Because this new program at Chapman was started with implementing CE, the frequency of testing before and after implementation of CE for other courses that were newly developed could not be compared. However, McDonough and colleagues reported that the frequency of testing for all of their courses was increased after implementation of CE.^{6} Therefore, although the results with the pharmacokinetics course cannot be directly extrapolated to other courses, one may expect that in the absence of more frequent testing, CE may have a negative effect on student performance.

Because of its theoretical and perceived advantages over traditional, single-course examinations, composite examination may be attractive to many pharmacy schools that are in the process of revising their curricula, including their delivery and scheduling.^{6} However, these perceived advantages have yet to be documented using actual performance data in future studies. Major questions remaining to be answered are student performance in other courses, which are included in the CE, and the actual effects of CE on long-term content retention. The true advantage of CE over traditional examinations in terms of learning outcomes, aside from scheduling preferences, would be if they indeed increase retention of learned material over time. In that case, even a reduction in individual course grades, as observed here with pharmacokinetics, may be justified because in the long term, learning would be improved. However, a reduction in the course grade because of CE, if observed with other courses as well, may require adjustments to the current standards for passing individual courses or overall student progression, which are established within the context of individual course examinations.

## CONCLUSION

Performances of two cohorts of students in daily pharmacokinetics quizzes, which were administered in a similar manner to both cohorts, were identical. However, student performance on the examinations were significantly lower in the cohort that received multicourse, composite examinations as opposed to the cohort that received traditional, individual course examinations. These data indicate that when compared to traditional, individual course examination, composite examination significantly reduces the grades of students in a basic pharmacokinetics course. Further studies are needed to determine whether these results may be extrapolated to other disciplines in pharmacy curricula.

## ACKNOWLEDGMENTS

The authors would like to thank Dr. Siu Fun Wong from the Chapman University School of Pharmacy for her assistance in scheduling and managing the composite examinations.

- Received January 19, 2017.
- Accepted April 24, 2017.

- © 2018 American Association of Colleges of Pharmacy

## REFERENCES

- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.