Running bugwas + gen files

John Lees included in Bioinformatics Scripts

2016-09-08 485 words 3 minutes

Contents

A recent paper by Earle et. al. nicely showed the use of linear mixed models to determine drug resistance related genetic variants. Part of the software provided is an R package called bugwas, which will make the nice plots in figure 1 for you.

Here are some notes on how to get it to run, and correctly format the input files

Getting gemma to work

You’ll need to use the author’s modified version of gemma, which can be downloaded here. This may not run on your system, due to the blas and lapack libraries being in unexpected places. You can easily solve this by making some symlinks.

First run ldd on the gemma executable to check which libraries cannot be found

ldd /nfs/users/nfs_j/jl11/installations/gemma.0.93b
 linux-vdso.so.1 => (0x00007fff28000000)
 libgsl.so.0 => /usr/lib/libgsl.so.0 (0x00007f9dfbcc0000)
 libgslcblas.so.0 => /usr/lib/libgslcblas.so.0 (0x00007f9dfba78000)
 libblas.so.3 => not found
 liblapack.so.3 => not found
 libstdc++.so.6 => /software/hgi/pkglocal/gcc-4.9.1/lib64/libstdc++.so.6 (0x00007f9dfa8d0000)
 libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f9dfa5d0000)
 libgcc_s.so.1 => /software/hgi/pkglocal/gcc-4.9.1/lib64/libgcc_s.so.1 (0x00007f9dfa3b8000)
 libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9df9ff8000)
 libgfortran.so.3 => /software/hgi/pkglocal/gcc-4.9.1/lib64/libgfortran.so.3 (0x00007f9df9cd8000)
 /lib64/ld-linux-x86-64.so.2 (0x00007f9dfc120000)
 libquadmath.so.0 => /software/hgi/pkglocal/gcc-4.9.1/lib/../lib64/libquadmath.so.0 (0x00007f9df9a98000)

In my case this was libblas and liblapack. Find the libraries using locate liblapack etc. Above, I have created symlinks to these in a location in my LD_LIBRARY_PATH variable /nfs/users/nfs_j/jl11/software/lib (you can make a similar directory and export it using export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<new directory>).

ln -s /usr/lib/lapack/liblapack.so /nfs/users/nfs_j/jl11/software/lib/liblapack.so.3
ln -s /usr/lib/libblas/libblas.so /nfs/users/nfs_j/jl11/software/lib/libblas.so.3

File formatting

The bugwas gen file format is as follows: ps sample_1_name sample_2_name 23 A C etc.

If you have a file readable into plink (stored as snps.bed, snps.bim and snps.fam) and bcftools, this is an easy conversion.

plink --bfile snps --recode vcf-iid --geno 0.05 --out bugwas
bcftools plugin missing2ref bugwas.vcf > bugwas2.vcf
bcftools query -f '%ID\t[%TGT\t]\n' bugwas2.vcf > bugwas_in.gen
sed -i -e 's/coor_//' -e 's/\/.//g' -e 's/\t$//' bugwas_in.gen
head -7 bugwas.vcf| tail -1 | cut -f 10- | sed 's/#/_/g'| awk '{print "ps\t" $0}' | cat - bugwas_in.gen > tmp.gen
mv tmp.gene bugwas_in.gen

As bugwas requires all sites to be imputed this code will take the major allele where any site is missing. An alternative would be to use the ancestral state, which the authors suggest using ClonalFrameML to do, though this will take longer to run.

If you’ve got a messy vcf rather than a bed as input a script I wrote may help you with missing/multiallelic sites.

Other pieces

You’ll also need gemma to generate the kinship matrix for the random effects:

gemma -bfile snps -gk 1 -o gemma_snps

You’ll need a phylogeny, which you can draw by standard methods (I’d recommend fasttree on your alignment if you’ve got > 1000 samples). If you’ve already got a tree make sure the tip labels match the samples in the gen file.

Run bugwas

The command you’ll need to run in R is of the form:

lin_loc(gen="bugwas_in.gen",
        pheno="bugwas.pheno",
        phylo="fasttree.tr",
        prefix="gwas",
        gem.path="/nfs/users/nfs_j/jl11/installations/gemma.0.93b",
        relmatrix="output/gemma_snps.cXX.txt",
        output.dir="./")

I’ve found for around 2 000 samples and 110 000 snps I needed around 15Gb of RAM and 4-5 hours to run.