It is no secret that top collegiate rowing teams like to recruit internationally. By broadening the pool of recruited athletes, teams can theoretically achieve greater roster depth, more effective internal competition, and ultimately more fruitful showings at championship regattas.
But does international recruiting always lead to better results? Using Sparks Consulting’s extensive collection of roster and performance data, we decided to take a look. For this investigation, we measured the effect of international athletes on IRA men’s heavyweight and NCAA women’s Division I championship performance between 2015 and 2017. We chose this scope for a number of reasons1. For you non stat-heads out there, don’t worry. We’ll do our best to clearly explain everything in practical terms.
At First Glance
Nine out of 22 eligible2 heavyweight men’s teams on the points table at the 2017 IRA Championship Regatta had at least 20% international representation on their rosters, with six of these teams filling out the top six places. For open weight women, the trend is less pronounced. Of the 22 Division I teams represented at the 2017 NCAA Championship Regatta, six had at least 20% international representation, with only two of these teams in the top six.
While these basic numbers are telling, they don’t tell us anything definitive or quantifiable about the effect of international athletes on performance. Using mixed effects multiple regression modelling in R, the Sparks research team measured the relationship between international athletes and performance.
Digging Into The Numbers
Multiple regression models measure the relationship between a certain dependent variable (in this case total points) and a set of independent variables that we believe may influence the dependent variable. In this case, we believe that international athletes, as well as points earned at the previous year’s regatta, number of boats entered in the regatta, and roster size, may influence points earned. The prior year points variable was included to account for the possibility that a team’s success in the previous year could lead to success in the current year. The number of boats entered by each team at the regatta was used to account for any correlation between number of entries and number of points earned. This variable was dropped in the NCAA model, as all teams at the NCAA regatta entered three boats. Roster size was used to account for team depth. A team with 48 athletes competing for 24 spots could potentially be more successful than a team with 28 athletes competing for 24 spots.
Mixed effects models generate “handicaps” for each subject to control for inherent differences between the subjects. In this case, the model theoretically accounts for differences between teams with respect to team culture, coaching, and even weather. It is highly important to note that mixed effects do not account for variables that change significantly during this timeframe. Indeed, this series of articles will attempt to study these elusive variables. For more information about mixed effects models, check out this informative tutorial.
After building and running a set of statistical models, the Sparks research team ran a series of tests to determine the statistical significance of the results. Most statistical procedures, from electoral polls to TV ratings, use a sample that represents a small percentage of the total population. As a result, statisticians must determine whether or not the results found with the small sample should be inferred to apply to the population as a whole. Statistical significance, simply put, is the probability that our result is the product of random chance or sampling error. The significance is expressed as a “P-value”. If our model has a P-value of 0.04, this means that there is a 4% chance that the result is the product of random variation. This result would be considered “significant at the 5% level”, meaning that there is less than a 5% chance that our model isn’t driving the result. The lower the P-value, generally speaking, the more reliable the model is. The 5% level is generally accepted to be the maximum level at which a model is considered statistically valid.
The Results
So what did our models and significance tests show?
Interestingly, international athletes have had more statistically significant effect on IRA men’s heavyweight performance than NCAA women’s Division I performance over the past three years. However, the effect in both cases is modest.
Our IRA men’s heavyweight model predicts a 1.88 point increase for each international athlete on the roster.
The difference between each place in the IRA varsity eight is worth four points in the standings. In practical terms, our model indicates that a boat of eight international recruits could theoretically nearly bridge the gulf between third and first place in the varsity eight. This model was found to be significant at the 5% level 3. Clearly, while this effect is significant, it is fairly modest.
Our NCAA Division I women’s model predicted a slightly stronger relative effect. For each international athlete, our model predicts a 1.68 point increase. In the NCAA National Championship Regatta, the difference between each place in the varsity eight is three points.
This means that, according to our model, a boat of international athletes could make up the difference between fourth and first place in the varsity eight.
However, this model was not found to be statistically significant4 at the 5%, or even the 10%, level. This means that there is a greater than 10% chance that the improved performance of women’s teams with more international recruits is due to some other factor, be it random variation or sampling error. As such, in the case of NCAA women’s Division I rowing, we cannot reasonably accept that the increased predicted number of points is due to our model.
What Does It Mean? And What’s Next?
Our model suggests that international athletes only have a tangible effect on results in collegiate men’s rowing. It is interesting and not entirely clear why this effect exists, but it may be due to any number of the numerous differences between men’s and women’s collegiate rowing in the United States. These differences include funding, number of varsity teams, and relative competitiveness at the international level. Importantly, our model seems to indicate that the effect of international athletes in collegiate rowing may be slightly overstated in popular discourse, as the magnitude of the effect is relatively small.
If international athletes aren’t the “magic bullet” for collegiate rowing success, could another variable fill this spot?
As we continue this multi-part study, we will evaluate how a variety of factors effect rowing performance on the IRA and Division I level. Our exploration of the determinants of success and failure in collegiate rowing will create the opportunity for us to compare these variables side by side.
Next up, we will explore the role of coaching experience in collegiate rowing performance. We hope to see you there.
Footnotes:
1. We chose the IRA and NCAA Championship Regattas because both regattas had a limited number of entries, varsity-only participation, and a full slate of lower level finals. Varsity-only regattas allowed us to compare apples to apples, as many club teams do not boast the recruiting resources and funding that many varsity teams do. Lower level finals allow for a full ranking of all boats in each boat class. We only use heavyweight teams because lightweight participation at the IRA National Championship did not constitute a large enough sample size. For women, we only use Division I NCAA teams because the number of internationally recruited Division II and III athletes was not large enough. We chose 2015-2017 as an initial starting point because we wanted to gauge the effect in recent years.
2. We define eligible teams as teams that fielded at least one heavyweight men’s eight and are thus eligible for heavyweight Ten Eyck Trophy points.
3. The P-value of the IRA men’s heavyweight model was 0.01389.
4. The P-value of the NCAA women’s Division I model was 0.11.