John Lees' blog
Pathogens, informatics and modelling at EMBL-EBI
A claim: 2.2e-16 is the most popular p-value in research papers, even more popular than 0.05 (or if you’re being cynical 0.049).
Why?
2.2e-16 happens to be the epsilon of a double-precision float (i.e. a decimal number stored using 64 bits). Roughly, this means that if you try to calculate 1 - epsilon, with anything smaller than epsilon, the answer will be 1.
In R, you can calculate this by running the following code (+2 due to convention):
A review of ‘Honey Roast Parsnips’ - available from Iceland £1.75, 750g
This weekend I wanted to buy some parsnips to roast, but they were absent from the produce section (except in a pack coming with four unwanted carrots, and one considerably more unwanted turnip). However, as I was shopping at Iceland, there was a handy pre-prepared frozen alternative:
These cost roughly double the amount of buying raw parsnips. I’d estimate there are around four large portions in this bag, you can probably get double that if you’re using a small amount.
Why is it only in 2021 that I am listening to Primal Scream’s Screamadelica for the first time?
A lot of critically acclaimed music from the 1980s maintains a pop appeal that means it still gets radio play, is featured in club nights, and is heavily promoted in my Youtube home. However, perhaps the post-rock, trip-hop and grunge of the early 1990s doesn’t have the same enduring commercial appeal. Whatever the reason, I’ve been missing out.
We recently released a beta version of PopPUNK-web (https://web.poppunk.net). This is a WebAssembly (WASM) version of pp-sketchlib which sketches an user-input genome assembly in the browser; transmits this sketch as a JSON to a server running PopPUNK using gunicorn and flask; runs query assignment against a large database of genomes from the GPS project; returns a JSON containing strain assignment, a tree and network; these are then displayed using a react app.
I was happy to see that this paper, which originally appeared as a preprint back in April 2019 (!), was published earlier this month. I thought it was one of the most thought-provoking papers I’ve read recently, so suggested a journal club on the final version (it’s long paper – over 80 pages).
There were some parts that I liked a lot, and some parts I didn’t like, which I wanted to summarise here.
I’ve recently ported one of my algorithms onto a GPU using CUDA. Here are some things I’ve learnt about the process (geared towards an algorithm dealing with genomic data).
Firstly, the documentation that helped me most:
Getting started: https://devblogs.nvidia.com/even-easier-introduction-cuda/ https://devblogs.nvidia.com/easy-introduction-cuda-c-and-c/ Understanding device memory: https://devblogs.nvidia.com/unified-memory-cuda-beginners/ https://devblogs.nvidia.com/how-access-global-memory-efficiently-cuda-c-kernels/ https://devblogs.nvidia.com/using-shared-memory-cuda-cc/ Putting it all together: https://devblogs.nvidia.com/efficient-matrix-transpose-cuda-cc/ Optimising your own code: https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/ Start small, add complexity in slowly I started off following the ’even easier introduction to cuda’ guide to get a basic version of my algorithm working.