Running bugwas + gen files
A recent paper by Earle et. al. nicely showed the use of linear mixed models to determine drug resistance related genetic variants. Part of the software provided is an R package called bugwas, which will make the nice plots in figure 1 for you.
Here are some notes on how to get it to run, and correctly format the input files
Getting gemma to work
You’ll need to use the author’s modified version of gemma, which can be downloaded here. This may not run on your system, due to the blas and lapack libraries being in unexpected places. You can easily solve this by making some symlinks.
First run ldd on the gemma executable to check which libraries cannot be found
ldd /nfs/users/nfs_j/jl11/installations/gemma.0.93b
linux-vdso.so.1 => (0x00007fff28000000)
libgsl.so.0 => /usr/lib/libgsl.so.0 (0x00007f9dfbcc0000)
libgslcblas.so.0 => /usr/lib/libgslcblas.so.0 (0x00007f9dfba78000)
libblas.so.3 => not found
liblapack.so.3 => not found
libstdc++.so.6 => /software/hgi/pkglocal/gcc-4.9.1/lib64/libstdc++.so.6 (0x00007f9dfa8d0000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f9dfa5d0000)
libgcc_s.so.1 => /software/hgi/pkglocal/gcc-4.9.1/lib64/libgcc_s.so.1 (0x00007f9dfa3b8000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9df9ff8000)
libgfortran.so.3 => /software/hgi/pkglocal/gcc-4.9.1/lib64/libgfortran.so.3 (0x00007f9df9cd8000)
/lib64/ld-linux-x86-64.so.2 (0x00007f9dfc120000)
libquadmath.so.0 => /software/hgi/pkglocal/gcc-4.9.1/lib/../lib64/libquadmath.so.0 (0x00007f9df9a98000)
In my case this was libblas and liblapack. Find the libraries using locate liblapack
etc. Above, I have created symlinks to these in a location in my LD_LIBRARY_PATH
variable /nfs/users/nfs_j/jl11/software/lib
(you can make a similar directory and export it using export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<new directory>
).
ln -s /usr/lib/lapack/liblapack.so /nfs/users/nfs_j/jl11/software/lib/liblapack.so.3
ln -s /usr/lib/libblas/libblas.so /nfs/users/nfs_j/jl11/software/lib/libblas.so.3
File formatting
The bugwas gen file format is as follows: ps sample_1_name sample_2_name 23 A C etc.
If you have a file readable into plink (stored as snps.bed, snps.bim and snps.fam) and bcftools, this is an easy conversion.
plink --bfile snps --recode vcf-iid --geno 0.05 --out bugwas
bcftools plugin missing2ref bugwas.vcf > bugwas2.vcf
bcftools query -f '%ID\t[%TGT\t]\n' bugwas2.vcf > bugwas_in.gen
sed -i -e 's/coor_//' -e 's/\/.//g' -e 's/\t$//' bugwas_in.gen
head -7 bugwas.vcf| tail -1 | cut -f 10- | sed 's/#/_/g'| awk '{print "ps\t" $0}' | cat - bugwas_in.gen > tmp.gen
mv tmp.gene bugwas_in.gen
As bugwas requires all sites to be imputed this code will take the major allele where any site is missing. An alternative would be to use the ancestral state, which the authors suggest using ClonalFrameML to do, though this will take longer to run.
If you’ve got a messy vcf rather than a bed as input a script I wrote may help you with missing/multiallelic sites.
Other pieces
You’ll also need gemma to generate the kinship matrix for the random effects:
gemma -bfile snps -gk 1 -o gemma_snps
You’ll need a phylogeny, which you can draw by standard methods (I’d recommend fasttree on your alignment if you’ve got > 1000 samples). If you’ve already got a tree make sure the tip labels match the samples in the gen file.
Run bugwas
The command you’ll need to run in R is of the form:
lin_loc(gen="bugwas_in.gen",
pheno="bugwas.pheno",
phylo="fasttree.tr",
prefix="gwas",
gem.path="/nfs/users/nfs_j/jl11/installations/gemma.0.93b",
relmatrix="output/gemma_snps.cXX.txt",
output.dir="./")
I’ve found for around 2 000 samples and 110 000 snps I needed around 15Gb of RAM and 4-5 hours to run.