/images/jl11_lots.jpg

John Lees' blog

Pathogens, informatics and modelling at EMBL-EBI

Likelihood ratio test in SEER

I have added the likelihood ratio test (LRT) for logistic regression into seer, in addition to the existing Wald test as noted in issue 42. As this is likely to remain undocumented elsewhere, here are some brief notes: Both the p-value from the Wald test, and the p-value from the new LRT are in the output. The LRT is expected to be a more powerful test in some situations. I would recommend its use over the Wald test.

Sorting a massive file

I want to count the number of unique patterns in a vcf file. First I convert it to text with bcftools query: bcftools query -f '[%GT]\n' vcf_in.vcf.gz > patterns.txt The resulting patterns.txt is about 100Gb. The best way I found to count the unique patterns in this was with the following command: LC_ALL=C sort -u --parallel=4 -S 990M -T ~/tmp_sort_files patterns.txt | wc -l This used 1063Mb RAM, took 1521s and used a maximum of around 75Gb tmp space on my home (as the /tmp drive on the cluster ran out of space).

Removing helperamc (Advanced Mac Cleaner) OS X

I was working on an OS X system which kept getting annoying pop-ups about the system needing clean up, anti-virus software etc. I was able to see that the window was titled ‘helperamc’. It turns out this was a remnant from Advanced Mac Cleaner, the use of which I won’t comment on here. The user of the system had tried to remove it when upgrading OS X version, but the annoying advertising component remained.

Running bugwas + gen files

A recent paper by Earle et. al. nicely showed the use of linear mixed models to determine drug resistance related genetic variants. Part of the software provided is an R package called bugwas, which will make the nice plots in figure 1 for you. Here are some notes on how to get it to run, and correctly format the input files Getting gemma to work You’ll need to use the author’s modified version of gemma, which can be downloaded here.

R packages break after OS X upgrade

I recently upgraded from OS X 10.10 to 10.11. This has upgraded the version of the gfortran dynamic library from 2 to 3 (in /Library/Frameworks/R.framework/Resources/lib), which in turn causes problems in various R packages (msm, ape). For those which give an error along the lines of unable to load shared object the solution seems to be to use install.packages recursively. Use it on the package that failed. If a dependency fails, use it on that too.