Exploring Attributes of Successful Machine Learning Assessments for Scoring of Undergraduate Constructed Response Assessment Items
Abstract
Content-based computer scoring models (CSMs) have successfully automated scoring of constructed response assessments, thereby increasing their use in multiple educational settings. However, the creation of CSMs remains time-intensive as little is known about model, item, and training set features that expedite development. Herein, we examined a large set of holistic CSMs for text classification to determine the relationship between scoring accuracy and different assessment item, CSM, and training set features. We found the number of rubric and CSM bins, item question structure, and item context significantly influenced CSM accuracy. By applying novel text diversity metrics, we found most text diversity metrics did not correlate to CSM accuracy. However, fewer shared words across responses correlated to increased CSM accuracy overall and within individual bins. Finally, we applied ordination technique to visualize constructed response corpora based on shared language among responses. We found these techniques aided decision-making during CSM development.