Over the last 50 years we have had evolution of the statistical methods used to calculate genetic predictions, EPDs, for livestock. What drove the evolution of these methods? Knowledge of statistical models? New methods? Data? Enabling computer technology? Golden states that he believes the drive for better models has been a desire to increase the accuracy of prediction.
Golden and Garrick had written grants to write genetic prediction software in the past. This avenue appears to have dried up, so they decided to start a company, Theta Solutions, in order to fund the development of genetic prediction. The latest genetic prediction runs contained 46,000 animals with genomic data.
Theta Solutions uses graphical processing units, originally built for video gaming, to have a high performance computer at a relatively low cost. The BOLT software focuses on custom turnkey analyses, once the system is set up all one needs to do is feed it data.
Using non-GPU computing, Golden can solve 51 million equations in 1649 seconds. The fastest GPU implementation took 78 seconds.
Why do we use a Bayesian sampler for solving mixed models?
- No accuracy approximation bias
- Can get PE covariance
- Can apply marker selection methods
- Can include prior information
With traditional methods, it took 23 seconds per sample, with new implementation can do a sample in 2 seconds. (Gibbs sampling is kind of like turning a statistical crank over and over to solve very complex equations, each sample is one turn of the crank.) They also parallelized the sampling, further speeding up the process. This parallelized processing is like working cattle with 100s of chutes rather than a single cute.
There are three ways to combine genomics with traditional EPDs,
- blending Genomic BLUP (combine pedigree prediction with genomic prediction, two separate analyses)
- single-step Genomic BLUP (combine pedigree relationships and genomic relationships, one analysis)
- hybrid model (single step with marker effects)
Single-step genomic models outperform traditional EPDs. But, the hybrid model outperforms both models, especially for unproven animals. The purpose of the hybrid model is to squeeze more information out of the data.
Currently looking at a data set with 6 million pedigree records, 4.8 million birth weight records, and 1.9 million post weaning gain records, 46,402 genotyped animals and used 44,414 SNP markers.