Embedding realistic experience of research software development in doctoral training

sabs_r3_cdt_logo_v3_ukri_epsrc_273x115 (1)

Embedding realistic experience of research software development in doctoral training.

The shortage of skilled research software engineers is well recognised within both academia and industry, but while many courses teach either research skills or software engineering skills few manage to integrate teaching in both areas. The SABS R3 CDT has developed a novel practical teaching approach in which groups of students learn by taking part in a challenging real-life software project mentored by academics, research software engineers and representatives from industry. The project has a cyclical structure with embedded peer-to-peer learning; each student gains from mentorship and teaching from students in previous cohorts, and (in subsequent years) the paid opportunity to pass their knowledge on to future cohorts.

The SABS R3 CDT (also known as the EPSRC Sustainable Approaches to Biomedical Science: Responsible and Reproducible Research Centre for Doctoral Training) is a 4 year doctoral training course run by the Department of Computer Science in Oxford. The first year is devoted entirely to training, initially this is a standard mix of graduate-level taught courses providing the necessary background in the application domain (biomedical sciences) as well as intermediate-to-advanced level training in software engineering but then, once the essential groundwork has been covered, the real learning begins!

The students are split into small groups and each group is expected to work collaboratively on a real-life research software project. Every step of the project experience has been carefully designed to closely mimic the experience of working in a research software environment, but with extensive support so students learn the real challenges (and excitement!) of research software engineering.

student_office_b_for_web

Impactful projects: The student projects are proposed by collaborators from the pharmaceutical and biotechnology industry; these are not theoretical learning exercises but real software development challenges that meet a real industrial need. Unlike many projects where students are tasked with writing new code from scratch, the SABS R3 students are passed existing code that requires further development; students are required to learn the code, and understand how the proposed developments integrate into the existing structure. The projects have a clear impact for health research thus students experience the satisfaction of working to develop solutions for real problems. Current examples include epidemiological modeling in collaboration with Roche, and quality control of medical images in collaboration with GE Healthcare.

Students are expected to embed the best software practices throughout the project; all projects are required to take a test driven development approach (with full test-coverage a stated goal). All projects are fully open source from the outset under a permissive (BSD 3-Clause) license, and available on GitHub (see, for example, https://github.com/SABS-R3-Epidemiology/epiabm). Each project is managed by the students following an Agile approach with regular meetings and short achievable tasks. Although the course leaders do not nominate or require one student to act as the group leader they note that in every group so far one or sometimes two students have adopted leadership roles, and worked to drive the project forward.

Gaining from peer-to-peer support: The CDT cohort of approximately 15-20 students are grouped into groups of 4 to 5 students. This is done carefully by the course leaders to ensure that each group contains students with a mix of skills (e.g. students that already have strong mathematical skills might be mixed with someone with leadership skills, at least one student with a strong programming or software engineering background etc). This is to replicate a normal working team environment, and helps to ensure the students learn to recognise their own (and others) strengths and weaknesses.

student_office_d_for_web

Passing knowledge between cohorts: One key aspect of the CDT is that each cohort of students is encouraged to support the next; each year a selection of students from previous years are paid to provide dedicated support to the first year students. Both parties gain from this experience, the first years students get extensive support from more experienced peers who can dedicate significant time to supervision, and the later cohorts gain the experience of teaching and passing their knowledge on. It works much in the same way that more established research software engineers are often required to onboard new staff members.

Inter-disciplinary guidance is essential: These projects are challenging and even with extensive peer support it is a steep learning curve for many. For most students this is their first experience of working in a research software engineering environment and extensive guidance is essential both for the learning outcomes of the students and the success of the project.

Each project has guidance on three fronts:

  • A professional research software engineer to guide the technical aspects,
  • An industry representative to ensure the development meets the needs of the industry
  • An academic expert (e.g. in medical imaging, or epidemiology) to guide the scientific direction This interdisciplinary supervisory team provides expert guidance that oversees the work and helps students overcome the obstacles that arise along the way.

Learning from experience: Now in its 4th year the program leaders have had time to reflect on the design of the CDT project, they highlight that peer-to-peer learning is a key aspect of the program’s success, with students gaining from both receiving and providing guidance. The inter-cohort training mimics many work environments where RSEs are responsible for training within their team.