At the end of each dressage competition riders’ results are posted and each rider can see how their performance has been assessed and how their score compares with others in the class. It is usually fairly easy for them to see whether they have met their expectations or not.
The other participants in the assessment process are, of course, the judges. At the end of a class each judge will have awarded a range of marks across the entire class, but how easy is it for them to know how aligned they have been with each other in making their assessments? They might comment to each other that “those marks look pretty consistent” or “we’ve had a few differences,” but they really have few objective measures of just how consistent or different those marks and/or rankings are. It is true that there is a detailed analytical element as part of the process of appointing or upgrading judges, but it is not practical to apply that approach to day-to-day judging.
This is where some statistical analysis can help. Statistics can be used in a number of ways – as a basis for forecasting, for inferring the properties of populations from sample data or simply as a way of describing or summarising sets of raw data. It is this last application that I will be using in this blog. Statistics will be calculated which will provide an objective “summary” of the sets of judges’ scores.
I do not intend to delve deeply into statistical theory or techniques in this blog. There are plenty of resources on the internet for anyone who wants such information. Of course, a little explanation will need to be provided so that readers know how to interpret the statistics that are being presented.
This blog will adhere to a number of key principles:
- Analysis will only be based on data that is already in the public domain. The statistics are just another way of presenting existing data, not creating new data.
- This blog will not voice any value-judgements on the statistics provided. Most importantly, the analysis cannot and will not imply whether any judges might be “right” or “wrong” in their assessments. The analysis is only designed to cast some light on the degree of alignment between judges in their scoring.
- Judges’ names will not be displayed. Clearly, since the underlying data is publicly available it would be possible for readers to undertake their own investigations but this blog will not explicitly provide that information.
- There is no intention of giving undue focus to any particular judge or judges.
- The scope of the blog will attempt to be comprehensive. However, classes to be analysed must have at least two judges and have a sufficient number of actual riders (I have opted for a criterion of at least 7) to make the results meaningful. The extent of the coverage may also depend upon the time that I can devote to this blog.
Above all, the purpose of this blog is simply to provide an objective view of judging data which hopefully will be of use and interest to competitors, judges and judge administrators alike.
A description of the methodology used for analysing judge data can be found here.
Some worked examples illustrating the methodology can be found here.