/images/jl11_lots.jpg

John Lees' blog

Pathogens, informatics and modelling at EMBL-EBI

Installing PEER executable peertool

PEER (probabilistic estimation of expression residuals) is a tool to determine hidden factors from expression data, for use in genetic association studies such as eQTL mapping. The method is first discussed in a 2010 PLOS Comp Bio paper: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000770 and a guide to its applications and use in a 2012 Nature Protocols paper: http://www.nature.com/nprot/journal/v7/n3/pdf/nprot.2011.457.pdf To install a command line version of the tool, you can clone it from the github page

Simulate disease state from a given odds ratio, fraction exposed, fraction affected

Problem: I have genetic data at a single variant site, where the minor allele frequency (MAF) is set, and the prevalence of disease is known (Sr). The variant is truly associated with the phenotype, at an odds ratio (OR) I want to set. How do I simulate the phenotypes given these three parameters, and whether each sample has the variant (exposed) or not? This is analogous to simulating data in a 2x2 contigency table, as discussed in stack exchange here: http://stats.

Using ALF to simulate large, closely related populations of bacteria

I am currently trying to use ALF (the stand-alone version) to simulate data from a custom tree, and include realistic parameters for SNP rate, INDEL rate, gene loss and recombination rates. This is a little different to what I think the program was originally designed for – small numbers of divergent organisms – but is probably an easier problem. ALF is good because it includes a lot of features of evolution more naive models don’t encompass, and gives good output useful for further simulation and testing work.

Threads, vectors and references in C++11 on OS X

I was trying to compile some C++ of the form std::vector<std::thread> threads; for (int i = 0; i<num_threads; ++i) { threads.push_back(std::thread(logisticTest, kmer_lines[i], samples); } with function prototype void logisticTest(Kmer& k, const std::vector<Sample>& samples); on OS X 10.10 with clang++ - Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn) First error message: no matching function for call to __invoke Solution: add -std=c++11 -stdlib=libc++ to CXXFLAGS in Makefile (thanks to http://stackoverflow.com/questions/14115314/how-to-compile-c11-with-clang-3-2-on-osx-lion)

Upgrade OS X 10.8 -> 10.10 (Yosemite) breaks perl, homebrew, gvim

After upgrading from OS X 10.8 (Mountain Lion) to 10.10 (Yosemite), I found that gvim no longer worked and exited with a cryptic dyld message similar to dyld: Symbol not found:. The first thing I tried was uninstalling it with homebrew, then reinstalling: brew uninstall macvim brew install macvim But I got a Trace/BPT trap: 5 during make. Trying to fix this by doing the things suggested by brew doctor and installing openssl gave me essentially the same errors.

A hierarchical Bayesian model using multinomial and Dirichlet distributions in JAGS

I am currently trying to model the state of a genetic locus in bacteria (which may be one of six values) using a hierarchical Bayesian model. This allows me to account for the fact that within a sample there is heterogeneity, as well as there being heterogeneity within a tissue type. This is also good because: I am able to incorporate results from existing papers as priors. From the output I can quantify all the uncertainties within samples and tissues, check for significantly different distributions between condition types.