John Lees' blog
Pathogens, informatics and modelling at EMBL-EBI
I have added the likelihood ratio test (LRT) for logistic regression into seer, in addition to the existing Wald test as noted in issue 42. As this is likely to remain undocumented elsewhere, here are some brief notes:
Both the p-value from the Wald test, and the p-value from the new LRT are in the output. The LRT is expected to be a more powerful test in some situations. I would recommend its use over the Wald test.
I want to count the number of unique patterns in a vcf file. First I convert it to text with bcftools query:
bcftools query -f '[%GT]\n' vcf_in.vcf.gz > patterns.txt The resulting patterns.txt is about 100Gb. The best way I found to count the unique patterns in this was with the following command:
LC_ALL=C sort -u --parallel=4 -S 990M -T ~/tmp_sort_files patterns.txt | wc -l This used 1063Mb RAM, took 1521s and used a maximum of around 75Gb tmp space on my home (as the /tmp drive on the cluster ran out of space).
I was working on an OS X system which kept getting annoying pop-ups about the system needing clean up, anti-virus software etc. I was able to see that the window was titled ‘helperamc’.
It turns out this was a remnant from Advanced Mac Cleaner, the use of which I won’t comment on here. The user of the system had tried to remove it when upgrading OS X version, but the annoying advertising component remained.
A recent paper by Earle et. al. nicely showed the use of linear mixed models to determine drug resistance related genetic variants. Part of the software provided is an R package called bugwas, which will make the nice plots in figure 1 for you.
Here are some notes on how to get it to run, and correctly format the input files
Getting gemma to work You’ll need to use the author’s modified version of gemma, which can be downloaded here.
Calculates the mean depth of mapped coverage over the whole genome from a mapped bam, using bedtools
bedtools genomecov -ibam -g bedtools_genome.txt \ To generate bedtools_genome.txt use
samtools faidx reference.fa cut -f 1-2 reference.fa.fai > bedtools_genome.txt
I recently upgraded from OS X 10.10 to 10.11. This has upgraded the version of the gfortran dynamic library from 2 to 3 (in /Library/Frameworks/R.framework/Resources/lib), which in turn causes problems in various R packages (msm, ape).
For those which give an error along the lines of
unable to load shared object the solution seems to be to use install.packages recursively. Use it on the package that failed. If a dependency fails, use it on that too.