Skip to main content

Main menu

  • Articles
    • Current
    • Early Release
    • Archive
    • Rufus A. Lyman Award
    • Theme Issues
    • Special Collections
  • Authors
    • Author Instructions
    • Submission Process
    • Submit a Manuscript
    • Call for Papers - Intersectionality of Pharmacists’ Professional and Personal Identity
  • Reviewers
    • Reviewer Instructions
    • Call for Mentees
    • Reviewer Recognition
    • Frequently Asked Questions (FAQ)
  • About
    • About AJPE
    • Editorial Team
    • Editorial Board
    • History
  • More
    • Meet the Editors
    • Webinars
    • Contact AJPE
  • Other Publications

User menu

Search

  • Advanced search
American Journal of Pharmaceutical Education
  • Other Publications
American Journal of Pharmaceutical Education

Advanced Search

  • Articles
    • Current
    • Early Release
    • Archive
    • Rufus A. Lyman Award
    • Theme Issues
    • Special Collections
  • Authors
    • Author Instructions
    • Submission Process
    • Submit a Manuscript
    • Call for Papers - Intersectionality of Pharmacists’ Professional and Personal Identity
  • Reviewers
    • Reviewer Instructions
    • Call for Mentees
    • Reviewer Recognition
    • Frequently Asked Questions (FAQ)
  • About
    • About AJPE
    • Editorial Team
    • Editorial Board
    • History
  • More
    • Meet the Editors
    • Webinars
    • Contact AJPE
  • Follow AJPE on Twitter
  • LinkedIn
Research ArticleRESEARCH

Differences in Multiple-Choice Questions of Opposite Stem Orientations Based on a Novel Item Quality Measure

Samuel Olusegun Adeosun
American Journal of Pharmaceutical Education March 2023, 87 (2) ajpe8934; DOI: https://doi.org/10.5688/ajpe8934
Samuel Olusegun Adeosun
High Point University, Fred Wilson School of Pharmacy, High Point, North Carolina
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
  • PDF
Loading

Abstract

Objective. To determine whether there are differences in the performance and quality of multiple-choice items with opposite stem orientations (positive or negative), based on a novel item quality measure and conventional psychometric parameters.

Methods. A retrospective study was conducted on multiple-choice assessment items used in years two and three of pharmacy school for pharmacotherapy and related courses administered between August 2018 and December 2019. Conventional psychometric parameters (difficulty and discrimination indices), average response time, nonfunctional distractor percentage, and a novel measure of item quality of negatively worded items were compared with those of control items, namely positively worded items (n=103 each). This novel measure uses difficulty and discrimination in tandem for the decision to reject, review, or retain items in an assessment. Statistical analyses were performed on continuous and categorical variables, on the relationship between difficulty and discrimination, and on differences in correlation coefficients between positively and negatively worded items.

Results. Stem orientation was not significantly associated with the novel measure of item quality. Also, there were no significant differences between positively and negatively worded items in any of the psychometric parameters. There were significant, negative correlations between difficulty and discrimination indices in both groups, and the correlation coefficients were significantly stronger in positively versus negatively worded items.

Conclusion. Items with opposite stem orientations show no differences in the novel item quality measure nor in conventional measures of performance and quality, except in difficulty-discrimination relationships. This suggests that negatively worded items should be used when necessary, but cautiously.

Keywords
  • psychometrics
  • item analysis
  • negatively worded items
  • multiple-choice questions
  • assessments

INTRODUCTION

Multiple-choice questions are one of the most commonly used assessment methods in medical and pharmacy schools because of their versatility, ease of construction, and efficiency (high reliability per hour of testing).1,2 Therefore, guidelines have been published on best practices for writing multiple-choice questions.2,3 Violating one or more of these guidelines may result in flawed questions, which may have various effects on both the students’ and item performance.2-5

A common item-writing flaw is the negative orientation of the stem.4,6 Such items are characterized by keywords, such as not, except, or false,3,7,8 which ask the test taker to identify the option that is wrong rather than the option that is right, as in positively worded items. Negatively worded items are often necessary when it is important for the student to know what not to do.9 However, there is an additional thinking stage required to answer negatively worded items compared to positively worded items,10,11 and such items introduce the risk of double negatives when answer options also include negatively worded statements.9 Therefore, negatively worded items are thought to increase difficulty and negatively impact test takers’ performance.12,13 Furthermore, Chiavaroli9 suggested that negatively worded items behave anomalously, primarily because high-performing students get those questions wrong, but the effects of negatively worded items in several studies have largely been inconclusive.9

Limitations of previous studies include the fact that negatively worded items have been analyzed jointly with other item-writing flaws,4,5,13 and when they have been studied or analyzed separately, sample sizes were usually small (N=5 to N=37).7,12,14-17 In addition, previous studies have compared the quality of positively and negatively worded items based on the conventional method using one or both of the major item analysis parameters (difficulty and discrimination) in isolation.3,4,7,12,14-17 Difficulty is the proportion of examination takers who get an item right, ranging from 0 to 1, representing the hardest to easiest questions, respectively.18,19 Discrimination measures how well an item differentiates between high and low scorers. Discrimination indices are calculated as either upper minus lower (U-L) or point biserial, using either a percentage of the top and bottom scorers or all test takers, respectively. Both U-L and point biserial discrimination indices are interpreted the same way and range from -1 to +1.18-21 However, because of the complex relationship between difficulty and discrimination,18,22,23 experts have recommended using both parameters in tandem rather than in isolation to gauge the quality of items in an assessment.19,24

Given that the quality, validity, and reliability of assessments depend on the quality of items,25-28 the aim of this study was to test the null hypothesis that negatively worded items are not different from positively worded items. We used a novel measure of item quality that has not been previously used to address this question, along with the conventional psychometric parameters (difficulty and discrimination). This novel measure uses the coordinates of an item in a difficulty-discrimination matrix to inform the recommendation to either reject, review, or retain an item within an assessment. We also explored other potentially distinguishing features of negatively worded items, including average response time, distractor functionality, and correlation between difficulty and discrimination.

METHODS

This was a retrospective analysis of items used in summative assessments at the High Point University Fred Wilson School of Pharmacy. Data were collected in July 2021 following institutional review board approval as an exempt study. All pharmacotherapy courses and the companion integrated pharmaceutical sciences courses taken in pharmacy students’ second and third years were included. These course series were selected because they represent core components of the didactic curriculum, make up a large group of related courses from which an adequate sample size of negatively worded items were obtainable, and their assessments are primarily based on multiple-choice questions. We included only items used in midsemester and final examinations administered between August 2018 and December 2019, before the disruption due to the COVID-19 pandemic, when all lectures and examinations were still conducted live and in person.

The school of pharmacy’s ExamSoft database (ExamSoft Worldwide LLC) was searched using the negatively worded item keywords except, false, and not,3,7,8 one at a time, within the specified courses and date range. Inclusion criteria for the items included having a negatively worded multiple-choice question stem, one correct answer (key), and at least three distractors. Bonus items or items for which all test takers were given credit were excluded because they might have been considered anomalous.24 Items with multiple correct options were also excluded because of interference with distractor analysis. An item analysis report was generated for the most recent assessment in which each item was used. For each included negatively worded item, item analysis data were collected into a data collection sheet in Microsoft Excel. Average response time and distractor analysis data, including number of options, number of nonfunctional distractors (ie, distractors with percentage selection <5%29,30) were also collected. Lastly, assessment data were recorded. The completed data collection sheet was cross-checked with the item analysis reports to identify and correct any data entry errors.

Control items (positively worded items) were selected for each negatively worded item during the data collection. A control item was the multiple-choice question item nearest to the negatively worded item in the item analysis report if the item was not a negatively worded item, was not a bonus/excluded item, was not already selected as a control item, was not a multiple-key multiple-choice question, and had four or more options. While this approach does not necessarily guarantee a paired match for each negatively worded item, it was used to ensure that control items were comparable. The order in which items appeared in the report was such that adjacent items addressed the same course content and were written by the same writer.

For each assessment, the percentage of negatively worded multiple-choice questions was calculated as follows: (number of negatively worded items × 100%) ÷ (number of items on the assessment − number of non–multiple-choice question items on the assessment). The percentage of nonfunctional distractors was calculated as follows: (number of nonfunctional distractors × 100%) ÷ (number of options − 1). A lower percentage of nonfunctional distractors is better.22,29

The spread in values may mask differences when psychometric parameters are analyzed as continuous variables. Therefore, for further analysis of the conventional approach (difficulty and discrimination in isolation), difficulty was categorized as difficult/hard (<.30), good (.30-.80), or easy (>.80),31 while discrimination (27% U-L and point biserial) indices were categorized as weak (<.20), fair (.20-.29), good (.30-.39), or very good (≥.40).24,28,32 We also categorized the items by their percentage of nonfunctional distractors into high- or low-quality distractor items. Since most of the items (86%) had four options and, consequently, three distractors, we categorized items with high-quality distractors as those with zero to one (0%-33.3%) nonfunctional distractors and items with low-quality distractors as those with two to three (66.7%-100%) nonfunctional distractors.

For the novel approach, items were categorized based on difficulty and discrimination in tandem. A guideline that has been shared among the school of pharmacy’s clinical sciences faculty was adapted for this study (Table 1). These criteria originated from the Lincoln Memorial University-DeBusk College of Osteopathic Medicine and provide suggestions on keeping, reviewing, or eliminating questions within an assessment based on the difficulty and discrimination indices considered in tandem. For vocabulary consistency, we maintained the review designation, while eliminate and OK designations were renamed to reject and retain, respectively. Item quality ranked best to worst was retain>review>reject. The cutoff points of this guideline generally align with other published conventions.20,22,31,33,34 However, for the current study, we modified the recommendation for cell B1 (Table 1) from review to OK/retain. This is because difficulty of .3-.5 falls within the ideal/good range and is above the guess rate for items with four to five options (.25 or .2, respectively).27,33 Also, a discrimination index >.3 is considered good to very good by these standards.21,24,27,33

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1.

Suggested Guidelines for Reviewing and Eliminating Question Itemsa

Statistical analysis was done in SPSS version 27 (IBM Corp). Continuous variables were assessed for normality using the Shapiro-Francia test, a variation of the Shapiro-Wilk test for sample sizes greater than 5.35 None of the continuous variables met the condition for normal distribution (p>.05); therefore, the Mann-Whitney test was used to compare medians of control items and negatively worded items for each of the psychometric parameters. For categorical variables, the chi-square test was used to test null hypotheses about associations/differences36; eg, item quality designations (retain, review, or reject) are not different, regardless of opposite stem orientations (positive or negative). The Spearman correlation was used to test correlations between difficulty and discrimination. Lastly, to determine whether correlations between the variables for the control positively worded items were different from those of the negatively worded items, we used the Fisher r-to-z transformation method.37 This method is also considered appropriate for nonparametric Spearman rho correlation values.38 In all statistical tests, α was set at .05.

RESULTS

Initial keyword search returned 334 items (of 1903 items without keywords). After reviewing and removing false positives (eg, keywords in options rather than in stems, etc), 110 negatively worded items were identified, out of which seven items were excluded (four bonus items, two items with multiple correct options, and one item with an unusually long average response time). Table 2 shows other details of the final 206 items (103 each of negatively worded items and control positively worded items) and the 36 assessments from which they were obtained. The number of negatively worded items ranged from one to 11 (mean=2.9, SD=2.5) items (1%-19.6%) per assessment.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2.

Sources of Negatively Worded Items and Descriptive Statistics of Assessments Included in the Study

Analysis of all psychometric parameters as continuous variables showed no significant differences between control and negatively worded items (p>.05; Table 3). Chi-square was used to test the null hypotheses that there are no associations of the stem orientation (positive or negative) with difficulty (hard, good/moderate, or easy), discrimination indices (weak, fair, good, or very good), or distractor quality (high or low). There were no significant associations between stem orientation and difficulty (χ2=.20, p>.05), U-L discrimination index (χ2=.78, p>.05), point biserial discrimination index (χ2=1.93, p>.05), or item distractor quality (χ2=.02, p>.05).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 3.

Continuous Variable Analysis of Psychometric Parameters of Control and Negatively Worded Items

Using chi-square, we then tested the null hypothesis that there is no association between stem orientation (positive or negative) and the novel measure of item quality (reject, review, or retain) (Table 1). There were no significant associations between stem orientation and item quality when using difficulty in tandem with either U-L (χ2=2.31, p>.05; Figure 1A) or point biserial (χ2=3.24, p>.05; Figure 1B) discrimination indices.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Control and negatively worded scatterplot of control and negatively worded items in a difficulty discrimination matrix, using item difficulty in tandem with (A) Upper-lower (U-L) discrimination index and (B) Point biserial (PBS) discrimination index. Filled circles represent the control items (positively worded items), while the unfilled circles represent negatively worded items (NWIs). The solid and dotted trend lines represent the Spearman correlation lines for the control items and NWIs, respectively. Correlation coefficients are -.837 and -.708 in A and -.609 and -.397 in B for control and NWIs, respectively, p<.001 in all four cases. In both A and B, the asterisk (*) represents a significant difference between the difficulty-discrimination correlation coefficients of control versus NWIs. Dark gray, light gray, and white cells represent locations of items to be rejected, reviewed, or retained, respectively, based on their difficulty and discrimination indices considered in tandem. For the control group and NWIs, each had n=103 items. Difficulty ranges from 0 to +1, while discrimination indices range from -1 to +1.

Among all 206 items, there was a significant negative correlation of U-L and point biserial discrimination with difficulty (r=-0.774, p<.001 and r=-0.504, p<.001, respectively). When testing to determine whether there were differences in correlations based on the stem orientation, the Fisher r-to-z transformation analysis showed that the difficulty-discrimination correlation coefficients in control items were significantly stronger (p<.05) than in negatively worded items using U-L (-0.837 vs -0.708, z=-2.319, p=.01) and point biserial (-0.609 vs -0.397, z= -2.031, p=.02) discrimination indices, respectively. The distribution of the items in the matrix and the correlation trend lines are shown in Figure 1.

DISCUSSION

Previous studies designed to determine whether negatively worded items are different in quality compared to positively worded items have used small sample sizes and conventional measures of item quality (difficulty and discrimination in isolation) and have produced mixed results.7,9,12 The current study employed a larger sample size and a novel measure of item quality that considered difficulty and discrimination in tandem. None of the individual item analysis parameters were significantly different between control and negatively worded items. More importantly, the novel measure also showed that item quality, represented by the proportions of items to be rejected, reviewed, or retained, was not significantly different between control and negatively worded items. However, the Fisher r-to-z analysis suggests that there is a significantly stronger negative correlation between difficulty and discrimination in control versus negatively worded items.

Despite the usefulness of item analysis parameters,22,31,32 they are frequently not sensitive enough to the detrimental effects of negatively worded items.9 Relying solely on discrimination, despite limitations, such as its nonrelevance at both extremes of difficulty, can lead to throwing out good questions.18,21,22 Also, a wide mix of item difficulties is needed for an assessment that appropriately reflects test takers' competencies.39 It is, therefore, reasonable to leverage the potential synergy of item difficulty and discrimination when used contextually to make decisions on the quality of items on an assessment.19 As evidence of the validity of this novel measure, variations of this approach have been used to evaluate the effect of faculty training on quality of examination items written,40 item complexity on item quality,13 impact of automatic item elimination based on item quality,26 and the relationship between distractor efficiency and item quality.41 However, to the best of our knowledge, this is the first time that this measure has been used to investigate the potential difference in quality between items of opposite stem orientation.

Although we arrived at the same conclusion using both conventional and novel approaches, the novel approach is inherently different and a more pragmatic measure of item quality. For example, if discrimination were used in isolation, 40% (83 items) of the 206 items would be considered weak (using the less stringent cutoff of discrimination <.15) and, therefore, subject to deletion/rejection, while only 1% (two items) would be rejected if the novel item quality measure were used (Figure 1A). Item deletions reduce assessment quality, validity, and reliability because of reduced content coverage and mix of item difficulties.25,26,39,42 This novel item quality measure is particularly advantageous as course content in pharmacy (like most clinical disciplines) is usually large, and assessments require a broad content coverage with many need-to-know concepts (with difficulty ≈1 and discrimination ≈0). Furthermore, given that test items are often banked for subsequent use in pharmacy school in-house assessments, this novel item quality measure also has implications for evidence-based item banking processes. Including a matrix similar to Figure 1 in examination software reports may be a helpful quick guide for instructors to review and interpret item analysis data.

Average response time is a less frequently used psychometric parameter. However, we included this parameter in the analysis because it correlates with difficulty43; is associated with difficulty, complexity, and cognitive domain44; and it may be a surrogate for test takers' effort or motivation.45 Considering the claims that negatively worded items are more difficult because of the extra time and effort needed to correctly read and interpret those questions,10,11 average response time is an appropriate multidimensional measure that should be sensitive to stem orientation. However, average response time was not significantly different between control and negatively worded items and, therefore, further supports the difficulty results.

Also, it has been suggested that one of the reasons why negatively worded items are used is because it is easier for item writers to come up with many plausible distractors for negatively worded items compared to positively worded items.9,46 Considering that the items were written by the same set of writers, our results did not support this assumption, as there was no difference between control and negatively worded items in how well their distractors functioned. Additionally, Tarrant and colleagues30 previously showed that items with a lower percentage of nonfunctional distractors are more difficult and more discriminating.30 Therefore, since the current study showed no differences in both difficulty and discrimination, the lack of difference in the percentage of nonfunctional distractors between control and negatively worded items was not surprising.

Another study23 showed a similar negative correlation between difficulty and discrimination (Figure 1). Although this coefficient was significantly different between control and negatively worded items, the practical significance of this difference may be limited, given that negatively worded items constitute a minor proportion of most assessments. While others have reported 11%-20%,5,7,14 we found an average of 4.5% negatively worded items in our study, suggesting that in a typical assessment, the relative impact of a few negatively worded items on the assessment would be negligible. That notwithstanding, this correlation difference is further evidence for perhaps limiting the use of negatively worded items, as previously suggested.3,14,15,24 Therefore, item writers in pharmacy education should use negatively worded items when necessary,21 for example, in a question to identify a drug that should not be recommended in certain comorbid conditions (eg, a nonselective beta-adrenergic antagonist in a patient with hypertension and asthma). To avoid negatively worded item overuse, one could, in this case, use the term contraindicated in the stem, which would effectively invert the orientation to a positively worded item without needing to change the multiple-choice question format (from single- to multiple-answer format) or the distractors used. However, such necessary scenarios are not limited to drug contraindications. Guidelines suggest switching the stem to the opposite (positive) orientation,2,3 but this invariably necessitates changing the multiple-choice question format15 and/or the options.14 Given that negatively worded items are one of many types of item-writing flaws, this study provides evidence that settling for negatively worded items, which will have a neutral impact on quality, is appropriate when the alternative is another item-writing flaw. For example, rather than asking test takers to “select all that apply,” or, worse still, including implausible distractors2,16 to invert a negatively worded item, the negatively worded item format would be the better alternative.

The current results are consistent with previous studies that also found no differences between positively and negatively worded items in either their difficulty or discrimination indices, or both in isolation (Table 3).12,16,47 This includes a recent study that used 111 negatively worded items, written by the same instructor across seven courses and seven years.48 Another study also showed no difference in the psychometric properties of negatively worded items following revisions to meet multiple-choice question writing best practices.17 However, other studies have reported differences between positively and negatively worded items ranging from lower to higher difficulty or discrimination.6-8,10 Notable caveats to those studies include negatively worded items limited to one keyword,8 lack of inferential statistics,10 limited generalizability because differences were specific to certain Bloom taxonomy levels,7 and low contribution of the stem orientation relative to the effects of other interacting factors.6 But, regardless of consistency with previous studies, considering the large sample size and perhaps a more robust item quality measure used along with the conventional measures, the current study provides a stronger body of evidence for the lack of differences in the performance and quality of negatively worded items versus positively worded items. Well-designed studies are needed to verify and further demonstrate the apparent robustness advantage of the novel method over the conventional method.

Limitations of this study include that even though control items were systematically selected to be comparable to the negatively worded items, the items were still independent and not the ideal positively worded item versions of each negatively worded item. Also, even though all questions were written by the same set of writers, and the main difference between the control and negatively worded items was the stem orientation, items of both orientations might have other item-writing flaws, albeit equally. Consequently, the combined effects of these flaws may have been enough to mask the effects of stem orientation. Another inherent limitation of item analysis, when used alone or perhaps also in tandem, is its dependence on the test takers’ cohort.3,21,24,26,41,49 Lastly, instructors’ judgement is always required, as relying on psychometric parameters alone can lead to assessments with poor validity and/or reliability.21,24 Therefore, these results should be applied with caution. Future studies should use positively worded item versions of the same negatively worded items as controls in a within-subject design.

CONCLUSION

Bearing the limitations of this study in mind, the results suggest that negatively worded items are not different from control positively worded items in the novel and conventional measures of performance and quality, except in difficulty-discrimination relationships. Therefore, in line with previous suggestions,3,14,15,48,50 negatively worded items should be used when necessary, albeit cautiously. For several reasons, negatively worded items have been called an unnecessary threat to validity,9 but based on these results, if these “when necessary” and “cautiously” provisos are considered, negatively worded items are rather benign, except, by consensus, when double negatives are not absent.

ACKNOWLEGEMENTS

I wish to acknowledge the Fred Wilson School of Pharmacy Academic Affairs team (Ms Amber Belvin, who wrote the ExamSoft data collection guide; Ms Gail Strickland and Peter Gal, PharmD) for their contributions and suggestions during the study planning process. I also wish to acknowledge Courtney Bradley, PharmD, Mary Jayne Kennedy, PharmD, and Peter Gal, PharmD, for reviewing various versions of this manuscript, and for their invaluable suggestions.

  • Received October 18, 2021.
  • Accepted April 13, 2022.
  • © 2023 American Association of Colleges of Pharmacy

REFERENCES

  1. 1.↵
    1. Schuwirth LWT,
    2. Vleuten CPM van der
    . ABC of learning and teaching in medicine: Written assessment. BMJ Br Med J. 2003; 326(7390):643. doi:10.1136/BMJ.326.7390.643
    OpenUrlFREE Full Text
  2. 2.↵
    1. Dell KA,
    2. Wantuch GA
    . How-to-guide for writing multiple choice questions for the pharmacy instructor. Curr Pharm Teach Learn. 2017;9(1):137-144. doi:10.1016/j.cptl.2016.08.036
    OpenUrl
  3. 3.↵
    1. Haladyna TM,
    2. Downing SM,
    3. Rodriguez MC
    . A review of multiple-choice item-writing guidelines for classroom assessment. Appl Meas Educ. 2002;15(3):309-333. doi:10.1207/s15324818ame1503_5
    OpenUrlCrossRef
  4. 4.↵
    1. Downing SM
    . The effects of violating standard item writing principles on tests and students: The consequences of using flawed test items on achievement examinations in medical education. Adv Heal Sci Educ. 2005;10(2):133-143. doi:10.1007/s10459-004-4019-5
    OpenUrl
  5. 5.↵
    1. Tarrant M,
    2. Ware J
    . Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursing assessments. Med Educ. 2008;42(2):198-206. doi:10.1111/j.1365-2923.2007.02957.x
    OpenUrlPubMed
  6. 6.↵
    1. Pais J,
    2. Silva A,
    3. Guimarães B,
    4. et al.
    Do item-writing flaws reduce examinations psychometric quality? BMC Res Notes. 2016;9(1):399. doi:10.1186/s13104-016-2202-4
    OpenUrl
  7. 7.↵
    1. Klender S,
    2. Ferriby A,
    3. Notebaert A
    . Differences in item statistics between positively and negatively worded stems on histology examinations. 2019;23. Accessed May 17, 2021. www.turningtechnologies.com
  8. 8.↵
    1. Casler L
    . Emphasizing the negative: A note on “not” in multiple-choice questions. Teach Psychol. 1983;10(1):51. doi:10.1207/S15328023TOP1001_15
    OpenUrlCrossRef
  9. 9.↵
    1. Chiavaroli N
    . Negatively-worded multiple choice questions: An avoidable threat to validity. Pract Assessment, Res Eval. 2017;22:3. doi:10.7275/5vvy-8613
    OpenUrl
  10. 10.↵
    1. Cassels JRT,
    2. Johnstone AH
    . The effect of language on student performance on multiple choice tests in chemistry. J Chem Educ. 1984;61(7):613-615. doi:10.1021/ed061p613
    OpenUrl
  11. 11.↵
    1. Tamir P
    . Positive and negative multiple choice items: How different are they? Stud Educ Eval. 1993;19(3):311-325. doi:10.1016/S0191-491X(05)80013-6
    OpenUrl
  12. 12.↵
    1. Caldwell DJ,
    2. Pate AN
    . Effects of question formats on student and item performance. Am J Pharm Educ. 2013;77(4). doi:10.5688/ajpe77471
  13. 13.↵
    1. Rush BR,
    2. Rankin DC,
    3. White BJ
    . The impact of item-writing flaws and item complexity on examination item difficulty and discrimination value. BMC Med Educ. 2016;16(1):250. doi:10.1186/s12909-016-0773-3
    OpenUrl
  14. 14.↵
    1. Harasym PH,
    2. Price PG,
    3. Brant R,
    4. Violato C,
    5. Lorscheider FL
    . Evaluation of negation in sof multiple-c items. Eval Health Prof. 1992;15(2):198-220. doi:10.1177/016327879201500205
    OpenUrlCrossRef
  15. 15.↵
    1. Harasym PH,
    2. Doran ML,
    3. Brant R,
    4. Lorscheider FL
    . Negation in stems of single-response multiple-choice items: An overestimation of student ability. Eval Health Prof. 1993;16(3):342-357. doi:10.1177/016327879301600307
    OpenUrlCrossRef
  16. 16.↵
    1. Pham H,
    2. Besanko J,
    3. Devitt P
    . Examining the impact of specific types of item-writing flaws on student performance and psychometric properties of the multiple choice question. Published online 2018. doi:10.15694/mep.2018.0000225.1
  17. 17.↵
    1. McBrien S
    . Effects of Structural flaws on the psychometric properties of multiple-choice questions. Theses, Student Res Creat Act Dep Teaching, Learn Teach Educ 93. Published online July 1, 2018. Accessed June 10, 2021. https://digitalcommons.unl.edu/teachlearnstudent/93
  18. 18.↵
    1. Sim S-M,
    2. Isaiah Rasiah R
    . Relationship between item difficulty and discrimination indices in true/false-type multiple choice questions of a para-clinical multidisciplinary paper. Ann Acad Med Singap. 2006;35(2):67-71.
    OpenUrlPubMed
  19. 19.↵
    ExamSoft. Exam quality through the use of psychometric analysis. Published 2023. Accessed February 20, 2023. https://examsoft.com/wp-content/uploads/2022/12/Exam-Quality-Through-the-Use-of-Psychometric-Analysis.pdf
  20. 20.↵
    1. Loudon C,
    2. Macias-Muñoz A
    . Item statistics derived from threeoption versions of multiple-choice questions are usually as robust as four- or five-option versions: Implications for exam design. Adv Physiol Educ. 2018;42(4):565-575. doi:10.1152/ADVAN.00186.2016
    OpenUrl
  21. 21.↵
    1. Burton RF
    . Do item-discrimination indices really help us to improve our tests? Assess Eval High Educ. 2001;26(3):213-220. doi:10.1080/02602930120052378
    OpenUrl
  22. 22.↵
    1. Quaigrain K,
    2. Arhin AK
    . Using reliability and item analysis to evaluate a teacher-developed test in educational measurement and evaluation. Cogent Educ. 2017;4(1). doi:10.1080/2331186X.2017.1301013
  23. 23.↵
    1. Al Muhaissen SA,
    2. Ratka A,
    3. Akour A,
    4. AlKhatib HS
    . Quantitative analysis of single best answer multiple choice questions in pharmaceutics. Curr Pharm Teach Learn. 2019;11(3):251-257. doi:10.1016/j.cptl.2018.12.006
    OpenUrl
  24. 24.↵
    1. Rudolph MJ,
    2. Daugherty KK,
    3. Ray ME,
    4. Shuford VP,
    5. Lebovitz L,
    6. Divall M V
    . Best practices related to examination item construction and post-hoc review. Am J Pharm Educ. 2019;83(7):1492-1503. doi:10.5688/ajpe7204
    OpenUrl
  25. 25.↵
    1. Downing SM
    . Reliability: on the reproducibility of assessment data. Med Educ. 2004;38(9):1006-1012. doi:10.1111/J.1365-2929.2004.01932.X
    OpenUrlCrossRefPubMed
  26. 26.↵
    1. Muntinga JHJ,
    2. Schuil HA
    . Effects of automatic item eliminations based on item test analysis. Adv Physiol Educ. 2007;31:247-252. doi:10.1152/advan.00019.2007.-Item
    OpenUrlCrossRefPubMed
  27. 27.↵
    1. Slepkov AD,
    2. Bussel ML,
    3. Van Fitze KM,
    4. Burr WS
    . A baseline for multiple-choice testing in the university classroom: SAGE Open. 2021;11(2). doi:10.1177/21582440211016838
  28. 28.↵
    1. Ebel RL,
    2. Frisbie DA
    . Essentials of educational measurement 5th Edition. 5th ed. Prentice-Hall Inc.; 1991.
  29. 29.↵
    1. Rodriguez MC
    . Three options qre optimal for multiple-choice items: a meta-analysis of 80 years of research. Educ Meas Issues Pract. 2005;24(2):3-13. doi:10.1111/J.1745-3992.2005.00006.X
    OpenUrl
  30. 30.↵
    1. Tarrant M,
    2. Ware J,
    3. Mohammed AM
    . An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis. BMC Med Educ. 2009;9(1):1-8. doi:10.1186/1472-6920-9-40
    OpenUrlCrossRefPubMed
  31. 31.↵
    1. Tavakol M,
    2. Dennick R
    . Post-examination analysis of objective tests. Med Teach. 2011;33(6):447-458. doi:10.3109/0142159X.2011.564682
    OpenUrlCrossRefPubMed
  32. 32.↵
    1. Chiavaroli N,
    2. Familari M
    . When majority doesn’t rule: The use of discrimination indices to improve the quality of MCQs. Biosci Educ. 2011;17(1):1-7. doi:10.3108/beej.17.8
    OpenUrl
  33. 33.↵
    1. Haladyna TM,
    2. Downing SM
    . How many options is enough for a multiple-choice test item? Educ Psychol Meas. 1993;53(4):999-1010. doi:10.1177/0013164493053004013
    OpenUrlCrossRef
  34. 34.↵
    1. Kehoe J
    . Basic Item Analysis for multiple-choice tests. Pract Assessment, Res Eval. 1994;4:10. doi:10.7275/07zg-h235
    OpenUrl
  35. 35.↵
    1. Shapiro SS,
    2. Francia RS
    . An approximate analysis of variance test for normality. J Am Stat Assoc. 1972;67(337):215-216. doi:10.1080/01621459.1972.10481232
    OpenUrlCrossRef
  36. 36.↵
    1. McHugh ML
    . The chi-square test of independence. Biochem Medica. 2013;23(2):143. doi:10.11613/BM.2013.018
    OpenUrl
  37. 37.↵
    1. Lenhard W,
    2. Lenhard A
    . Hypothesis tests for comparing correlations. Psychometrica. doi:10.13140/RG.2.1.2954.1367
  38. 38.↵
    1. Myers L,
    2. Sirois MJ
    . Spearman correlation coefficients, Differences between. Encycl Stat Sci. Published online August 15, 2006. doi:10.1002/0471667196.ESS5050.PUB2
  39. 39.↵
    1. Royal KD
    . Using the nudge and shove methods to adjust item difficulty values. J Vet Med Educ. 2015;42(3):239-241. doi:10.3138/jvme.0115-008R
    OpenUrl
  40. 40.↵
    1. Caldwell DJ,
    2. Sampognaro L,
    3. Pate AN
    . Collaborative examination item review process in a team-taught, self-care sequence. Am J Pharm Educ. 2015;79(6). doi:10.5688/ajpe79687
  41. 41.↵
    1. Puthiaparampil T,
    2. Rahman M
    . How important is distractor efficiency for grading best answer questions? BMC Med Educ. 2021; 21(1):1-6. doi:10.1186/S12909-020-02463-0
    OpenUrlCrossRef
  42. 42.↵
    1. Downing SM,
    2. Haladyna TM
    . Validity threats: overcoming interference with proposed interpretations of assessment data. Med Educ. 2004;38(3):327-333. doi:10.1046/J.1365-2923.2004.01777.X
    OpenUrlCrossRefPubMed
  43. 43.↵
    1. Yang CL,
    2. O’Neill TR,
    3. Kramer GA
    . Examining item difficulty and response time on perceptual ability test items. J Appl Meas. 2002;3(3):282-299. Accessed June 15, 2021. https://europepmc.org/article/med/12147914
    OpenUrlPubMed
  44. 44.↵
    1. Zenisky AL,
    2. Baldwin P
    . Using item response time data using item response time data in test development and validation: Research with beginning computer users.; 2006. Accessed February 20, 2023. https://www.umass.edu/remp/Papers/NCME06_AZPB_final.pdf
  45. 45.↵
    1. Chae YM,
    2. Park SG,
    3. Park I
    . The relationship between classical item characteristics and item response time on computer-based testing. Korean J Med Educ. 2019;31(1):1-9. doi:10.3946/kjme.2019.113
    OpenUrl
  46. 46.↵
    1. Chéron M,
    2. Ademi M,
    3. Kraft F,
    4. Löffler-Stastka H
    . Case-based learning and multiple choice questioning methods favored by students. BMC Med Educ. 2016;16(1). doi:10.1186/s12909-016-0564-x
  47. 47.↵
    1. Violato C,
    2. Marini AE
    . Effects of stem orientation and completeness of multiple-choice items on item difficulty and discrimination. Educ Psychol Meas. 1989;49(1):287-295. doi:10.1177/0013164489491032
    OpenUrlCrossRef
  48. 48.↵
    1. Wise MJ
    . The effective use of negative stems and “all of the above” in multiple-choice tests in college courses. J Educ Teach Soc Stud. 2020;2(4):47. doi:10.22158/JETSS.V2N4P47
    OpenUrl
  49. 49.↵
    1. De Champlain AF
    . A primer on classical test theory and item response theory for assessments in medical education. Med Educ. 2010;44(1):109-117. doi:10.1111/j.1365-2923.2009.03425.x
    OpenUrlCrossRefPubMed
  50. 50.↵
    1. Karegar Maher MH,
    2. Barzegar M,
    3. Gasempour M
    . The Relationship between negative stem and taxonomy of multiple-choice questions in residency pre-board and board exams. Res Dev Med Educ. 2016;5(1):32-35. doi:10.15171/rdme.2016.007
    OpenUrl
PreviousNext
Back to top

In this issue

American Journal of Pharmaceutical Education
Vol. 87, Issue 2
1 Mar 2023
  • Table of Contents
  • Index by author
Print
Download PDF
Email Article

Thank you for your interest in spreading the word on American Journal of Pharmaceutical Education.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Differences in Multiple-Choice Questions of Opposite Stem Orientations Based on a Novel Item Quality Measure
(Your Name) has sent you a message from American Journal of Pharmaceutical Education
(Your Name) thought you would like to see the American Journal of Pharmaceutical Education web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
15 + 0 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Citation Tools
Differences in Multiple-Choice Questions of Opposite Stem Orientations Based on a Novel Item Quality Measure
Samuel Olusegun Adeosun
American Journal of Pharmaceutical Education Mar 2023, 87 (2) ajpe8934; DOI: 10.5688/ajpe8934

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Differences in Multiple-Choice Questions of Opposite Stem Orientations Based on a Novel Item Quality Measure
Samuel Olusegun Adeosun
American Journal of Pharmaceutical Education Mar 2023, 87 (2) ajpe8934; DOI: 10.5688/ajpe8934
Reddit logo Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • INTRODUCTION
    • METHODS
    • RESULTS
    • DISCUSSION
    • CONCLUSION
    • ACKNOWLEGEMENTS
    • REFERENCES
  • Figures & Data
  • Info & Metrics
  • PDF

Similar AJPE Articles

Cited By...

  • No citing articles found.
  • Google Scholar

More in this TOC Section

  • Scholarly Activity of Tenure-Track Faculty in US Departments of Pharmacy Practice
  • Student Performance Outcomes and Perceptions in Two Content Areas in Conventional Versus Integrated Pharmacy Curricula
  • A Survey of Hiring and Non-hiring Pharmacists’ Perceptions of a Primary Care Certificate Training Program
Show more RESEARCH

Related Articles

  • No related articles found.
  • PubMed
  • Google Scholar

Keywords

  • psychometrics
  • item analysis
  • negatively worded items
  • multiple-choice questions
  • assessments

Home

  • AACP
  • AJPE

Articles

  • Current Issue
  • Early Release
  • Archive

Instructions

  • Author Instructions
  • Submission Process
  • Submit a Manuscript
  • Reviewer Instructions

About

  • AJPE
  • Editorial Team
  • Editorial Board
  • History
  • Contact

© 2023 American Journal of Pharmaceutical Education

Powered by HighWire