Post-mortem: that time I created a DSL for grading student projects

24 February 2024

In an academic setting, we aim to grade student projects as objectively as possible. In the Master of Computer Science Engineering program at Ghent University, the machine learning course traditionally featured a significant project component contributing to the final grade. Throughout the course, students delivered three interim presentations, each evaluated by three assessors: the professor and two teaching assistants (TAs). To ensure consistency and fairness, we implemented a rubric system where assessors delineated specific criteria. To streamline the assessment process, we automated the process using a custom built tool, which also included a Domain Specific Language (DSL).

The grading tool was aptly named after the ancient Greek goddess of justice, Themis (here, envisioned by Bing Copilot Designer).

To enhance efficiency during live grading sessions and eliminate the need for manual data entry, I created a web application. This application leverages a Spring and PostgreSQL backend (utilizing Hibernate ORM) alongside an Angular frontend. These technologies were chosen based on my familiarity with them as a software engineer. The interface allows for project creation and rubric setup, empowering assessors to efficiently evaluate student work during presentations. For instance:

After grading, a score overview for the different groups is automatically generated. This score overview is generated based on the criteria that the evaluators have selected. The professor and TAs assigned weights to every criterion a priori.

Following assessment, an automatic score overview for different student groups is generated according to the predefined criteria and weights assigned by the professor and TAs.

In cases where the default weighted sum of scores proved inadequate, a script could be devised to compute scores using an alternative formula. We opted for a straightforward arithmetic language due to constraints, such as limited resources on the VM running the web application. A sample script might resemble:

c1r1 = [Validation approach/Splitting and validation]
c1r2 = [Validation approach/Avoid data leakage]
c1 = 2 * $c1r1 + 1 * $c1r2
1 * $c1

When we run this script for a group, it first extracts the values for the criteria "Splitting and validation" and "Avoid data leakage" into two variables. The values in these variables correspond to the a priori weights for each tick box that grades could select. For example, if the students did not account for data leakage, the value of $c1r2 could be 0, indicating that they did not get a score for that criterion. In the above example, setting up a correct train-validate-test split is twice as important as accounting for data leakage, because c1 = 2 * $c1r1 + 1 * $c1r2. The final line in the script must be an expression that computes the final score for a group.

This scripting language is implemented as an interpreted stack based language, utilizing an ANTLR v4 grammar. Here is a small excerpt of the grammar

grammar ScoreArithmetic;

file : line* expression EOF;

line
   : COMMENT
   | assignment
   ;

assignment
   : avariable '=' expression
   ;

After computing the scores, an Excel spreadsheet was generated with the scores for every criterion, to allow for a final check of the score computations. We also checked the variance of the scores across assessors and if objectively true or false statements were graded consistently, re-evaluating student groups if required. Along with it, a PDF with feedback was generated. These files were created automatically with iText.

Over the past several years, the course has changed and the Computer Science Engineering program has been redesigned. As a result, Themis is no longer used.