/images/jl11_lots.jpg

John Lees' blog

Pathogens, informatics and modelling at EMBL-EBI

One hundred days and one hundred lines of code

Last week I attended ‘100 days and 100 lines of code’, which was organised by the Epiverse team at LSHTM. The overall idea was to think about when the next pandemic happens, what the first 100 lines of code written would be (I think more as a cute reference to similar thoughts about vaccine development, rather than a totally serious concept).

The format was over three days:

  1. Talks from academics, public health and field epidemiologists on their thoughts and experiences with epidemiology software.
  2. Exercise: starting with data simulated from an outbreak, create problems in the data and a list of questions to answer. Then swap with another group and try and answer their questions. (You can find our exercise response on github)
  3. Summarise common experiences and problems with software from the second day.

The only couple of reservations about the event I had was that no software developers or research software engineers spoke – which I found odd considering it was ostensibly an event about writing code – and I think we missed their perspective, and whether problems with epidemiology software are similar to other scientific fields. There was also more of a focus on outbreak response and field epi, rather than pandemic response, but maybe that’s reasonable.

mamba saved my CI

We moved from Azure to github actions to run the continuous integration tests in PopPUNK about a year ago. It’s been working pretty well and wasn’t too bad to set up, and integrates nicely into the pull requests.

However, in the past month two things happened:

  • joblib v1.2 introduced a breaking security change which meant that hdbscan errored. Solving a conda environment pinning joblib to 1.1 takes about 12 hours (😱) to solve (longer than the 4 hour github limit).
  • Even without the pin, environment resolution increased from about an hour to 3-4hrs, so the CI would only sometimes run.

About 4/5 years ago I’d tried using mamba as a replacement for conda. It worked really well and was much faster, but I’d since read that some of its techniques were being merged into conda, and in general I stopped having hugely long times while solving the environment. But times seemed to be getting longer again (especially on the CI, not sure why), and I think mamba actually remains fundamentally different to the conda solver.

Goodbye Wordpress

I originally set up a free wordpress (https://leesjohn.wordpress.com) in 2013, which I updated slightly when I moved it to some cheap hosting on www.johnlees.me in 2019. The hosting was fairly unreliable and every year’s renewal I thought about moving. Maintaining a wordpress also takes a bit of effort and I also worried about eventually allowing it to lapse. Finally, I’d grown increasingly frustrated with the wordpress style of editing, which had made it difficult to write code and embed HTML, and really wanted to move to something more like text to write posts.

Host/pathogen data for 'Joint sequencing of human and pathogen genomes reveals the genetics of pneumococcal meningitis' available on EGA

I have recently gotten round to adding the human data (and links to pathogen data, which has been available on the ENA since publication) to the managed access European Genome-Phenome Archive.

The sharing of human genotype data is a little more fraught than bacterial genome data due to patient ethics and other issues, but the EGA offers a good solution for protecting this while making the data as open as possible.

WE, Arcade Fire (2022) – A reverse in trajectory?

I first saw Arcade Fire performing Neon Bible at Glastonbury (though sadly only on the BBC broadcast). At the time it was released, I was working at the local supermarket at the weekends pushing trolleys around in the car park. I had a cheap – iRiver if I recall – MP3 player with space for a handful of albums that I would illicitly listen to. I remember having Coldplay’s Viva la vida, Magazine’s Real Life, and Neon Bible. Neon Bible, being clearly by the the best of these, I must have listened to all the way through hundreds of times. I still like it, and ‘My Body Is A Cage’ is probably my favourite of their less synth-heavy songs.

Using the new Microreact API

(the excellent) Microreact has recently had a major new release which has a few breaking changes. One that hit me is that the API has changed. The previous API was pretty simple, and allowed anonymous POST requests with a blob of CSV, tree and optionally network to return a stable URL.

The new API requires a token for authorisation and addition to your account (which seems sensible), and also adds deletion and updating of instances (which is also useful). There are some docs on migrating, and while creating the API token is easy enough, I had a bit of trouble converting my existing code.