Some selfish and not-so-selfish reasons for sharing you code¶

Lucy Whalley / lucydot.github.io¶

drawing

Who am I?¶

📖 VCF at Northumbria University

  • computational materials science: physics, chemistry, software engineering

🖥️ Fellow of the Software Sustainability Institute

🍏 Ex-school/college teacher

⚠️ (Full disclosure!) Topic editor for The Journal of Open Source Software

In today's talk I want to persuade you that sharing your code is a Good Idea $^\mathrm{TM}$ that can bring several benefits.¶

Now the reason I think it is a good idea is from personal experience. Because sharing my research code transformed the way I use and develop software.

Case Study: Effmass.py¶

  • This was my first research software project
  • Effmass calculates the effective mass of electrons in a particular material
  • But the domain specific details aren't important..

Effmass does one thing, and does it well.

Effmass circa 2016 🍝¶

  • One module contains (long) methods for data parsing, analysis and print out
  • In a private Github repo (with me as sole contributor)
  • Has the functionality needed for my own data analysis
  • No testing or documentation

Effmass circa 2018 🖥️¶

  • Re-factored into six modules
  • In a public Github repo
  • Testing and continuous integration
  • Documentation website

Effmass now 💖¶

  • 6 contributors
  • Can parse data from multiple sources
  • 14,000 downloads (PyPI stats), 4 citations (a discussion around that ratio is a whole other talk...)

JOSS transformed the way I look at software¶

  • It justified the time spent on learning new skills: documentation, testing, packaging (I'll get a journal publication!)
  • The peer-review process forced me to share my code and this built my confidence (I'm not the worst programmer in the world!)

drawing

Some selfish reasons for sharing code¶

  • If other people can use your code --> you can re-use your code --> more efficient working
  • Valuable feedback through the peer review process
  • Research credit through citation counts
  • Career progression: the RSE career path, funding opportunities (e.g. EPSRC)
  • Appreciation from your colleagues and self-promotion

What about the not-so-selfish reasons?¶

  • Other people can use your code and the field will progress more rapidly
  • To ensure scientific reproducibility

We have publication processes to root out error for research that is done without a computer. Once you introduce a computer, the materials section in a typical scientific paper doesn’t come close to providing the information that you need to verify the results. Analysing complicated data by computer requires instructions consisting of script and code. Hence we need the code, and we need the data.

Victoria Stodden, Editor, Journal of the American Statistical Association

Top tier journals - science and nature - require you to share the code that is needed to get to a certain result.

 Code sharing options¶

Public repo + citation file:

✔️ Straight forward to implement: Github + Citation File Format
✖️ No peer review

Code sharing options¶

Code review with a colleague or community member:

✔️ A good option if you are nervous about releasing your code to the wild!
✔️ Several on-line initiatives if there is no-one in your immediate research circle
✖️ No citation

Notes on confidence: Used to work in a prison teahing adult men maths. These are people up to their 60s who have very little confidence in their academic ability, no formal qualification, 40 years since they were last in school. So they're nervous. I have seen much the same behaviour teaching PhD students across several universities in the UK. Coding, programming is one of those things that people tend to find disproportionality scary, that very intelligent people can still for some reason have very low confidence with. An important first step in building confidence is sharing your code. You will find that yes your code isn't perfect, but thats fine, no-one elses is perfect either.

Code sharing options¶

Executable paper (e.g. a Jupyter Notebook) as Supplementary Information:

✔️ Citeable
✖️ Code is usually not peer-reviewed  
✖️ Limited to smaller pieces of code  
✖️ Requires a corresponding full length article   

Publishing code in a traditional journal¶

Publishing in a traditional computational journal (e.g. Journal of computational electronics)

✔️ Citeable
✖️ Code is usually not necessarily peer-reviewed  
✖️ Requires mapping your code to a written journal article (*is this a good use of resources?*)  


"...the basic means of communicating scientific results hasn’t changed for 400 years. Papers may be posted online, but they’re still text and pictures on a page."

From The Scientific Paper Is Obsolete by James Somers

If software is a valid research output, why don't we publish our software in journals as standard? Traditional academic journals weren't designed with computational work in mind, and they haven't adapted to the emergence of the computational sciences. In 1665...paper journal summarising work in static text, pictures and mathematical symbols. In 2021....paper and online journals are still summarisinf work as static text, pictures and math symbols. And even though software is central to much of science, I'd argue that it's almost impossible to accurately describe the work you did in this format. We could spend precious time describing the code as text and images - but is this useful use of our time? And why are we not peer-reviewing the output itself - the software? So

Why don't we publish our code in journals as standard?¶

  • Publishing is dominated by the pdf paper format
  • The pdf cannot easily capture/describe the essence of the code
  • ...unless we spend precious time describing the code as text (and is this useful use of our time?)
  • the code itself is not reviewed

Publishing code in a developer friendly journal¶

Publishing in a developer friendly journal (e.g. The Journal of Open Source Software, The Journal of Open Research Software)

✔️ A citeable journal publication
✔️ Code is peer-reviewed
✔️ Time efficient: The paper can be prepared in less than an hour

A summary of our code sharing options¶

Publishing method Example Citation? Software peer-review? Journal publication? Time-efficient?
Public repo + citation file Citation File Format ✔️ ✖️ ✖️ ✔️
Community peer review rOpenSci, pyOpensci ✔️ ✔️ ✖️ ✔️
Executable paper as Supplementary Information Jupyter Notebook ✔️ ✖️ ✔️ ✔️
Software paper Journal of Computational Electronics ✔️ ✖️ ✔️ ✖️
Software meta-paper JOSS, JORS ✔️ ✔️ ✔️ ✔️

List of software journals: https://www.software.ac.uk/which-journals-should-i-publish-my-software

But my code isn't good enough to share¶

Yes it is. It doesn't need to be perfect. Sharing your poorly documented, untested, messy code is better than sharing no code. If you want to see an example of bad code that is being shared publicly, feel free to visit my Github (username: lucydot) 😬

JOSS: making it as easy as possible to write a software paper¶

  • Paper preparation and submission for well documented software should take less than an hour:
    • around two pages
    • describes the high level functionality
    • does not need to contain novel results
  • Open access and no publishing charge

JOSS scope 🔭¶

The software must...

  • be open source
  • have an obvious research application: allows new research challenges to be addressed or makes addressing research challenges significantly better (e.g., faster, easier, simpler)
  • result from a substantial scholarly effort (rule of thumb: 3 months minimum)

JOSS peer review process ✏️¶

  • Completely open, Github-based peer review process
  • Reviewing the software itself
  • Focused on improving the submission through dialogue - JOSS is an open collaboration between author, editor, reviewers (and robot!)

I used the review criteria can be used as a curriculum for self-directed learning

JOSS fills a gap in the market¶

🎈 1000th paper published in 2020 🎈

Sharing bad code is better than sharing no code¶

✨ Code review can build confidence (SSI, pyOpenSci, rOpenSci....)
✨ Code publishing enables quantifiable credit for your work
✨ If you use software, cite it
✨ We are always looking for new JOSS reviewers

✨ Thanks ✨¶

  • Software Sustainability Institute
  • JOSS authors, reviewers, editors
  • The developers and maintainers of the Python Scipy ecosystem

These slides will be made available on the event page.