John Lees' blog
Pathogens, informatics and modelling at EMBL-EBI
I recently read a pre-print from the Veening lab where they had reconstructed various (22 total) physiological conditions in vitro and then measured expression levels with RNA-seq. I thought it was a great bit of research, and would encourage you to read it here if you’re interested: https://doi.org/10.1101/283739
They’ve also done a really good job with data availability, having released a browser for their data (PneumoExpress), and they have put their raw data on zenodo.
I saw this phylogenetics package today, phyx: https://github.com/FePhyFoFum/phyx
To install without admin rights/sudo I needed to do the following (my software is installed in my home ~/software, rather than e.g. /usr, /usr/local):
Compile armadillo as follows
cmake -DINSTALL_PREFIX=$(HOME)/software make make install Compile nlopt as follows
./configure --with-cxx --without-octave --without-matlab --prefix=$(HOME)/software make make install Compile phyx as follows (slightly hacky, maybe there’s a ‘proper’ way)
./configure --prefix=$(HOME)/software change line 11 of the Makefile (CPP_LIBS) to add the library path:
Trying to install PyVCF under a python (3) virtual environment gave me the following error:
(venv)johnlees@hpc:~$ pip install pyvcf Downloading/unpacking pyvcf Downloading PyVCF-0.6.8.linux-x86_64.tar.gz (1.1MB): 1.1MB downloaded Saved /tmp/downloadcache/PyVCF-0.6.8.linux-x86_64.tar.gz Running setup.py egg_info for package pyvcf Traceback (most recent call last): File "", line 16, in FileNotFoundError: \[Errno 2\] No such file or directory: '~/venv/build/pyvcf/setup.py' Complete output from command python setup.py egg_info: Traceback (most recent call last): File "", line 16, in FileNotFoundError: \[Errno 2\] No such file or directory: '~/venv/build/pyvcf/setup.
Marco Galardini and I have recently reimplemented the bacterial GWAS software SEER in python. As part of this I rewrote my C++ code for Firth regression in python. Firth regression gives better estimates when data in logistic regression is separable or close to separable (when a chi-squared contingency table has small entries).
I found that although there is an R implementation logistf I couldn’t find an equivalent in another language, or python’s statsmodels.
In GWAS the Bayesian Sparse Linear Mixed Model (BSLMM) is a hybrid of the LMM, which assumes all SNPs have an effect size drawn from a normal distribution (closer to ridge regression), and sparse regression which finds a few SNPs with non-zero effect sizes.
In their paper on this model Zhou et al show that this hybrid method can have better prediction accuracy than either individual model on its own (which are special cases in their model), and can also estimate the proportion of variance explained by polygenic and sparse effects.
Tanglegrams are a visual method to compare two phylogenetic trees with the same set of tip labels. This can be useful for comparing trees produced by different methods on the same alignment, or on different alignments of the sample set. Tanglegrams work by connecting the matching tips of the trees, then rotating subtrees to minimise the number of crossings. The algorithm was published in 2011, and continues to be used in a range of publications (for example genomic epidemiology).