Title: | Asymmetric Linkage Disequilibrium (ALD) for Polymorphic Genetic Data |
---|---|
Description: | Computes asymmetric LD measures (ALD) for multi-allelic genetic data. These measures are identical to the correlation measure (r) for bi-allelic data. |
Authors: | Richard M. Single |
Maintainer: | Richard M. Single <[email protected]> |
License: | GPL-2 |
Version: | 0.1 |
Built: | 2025-02-12 03:56:01 UTC |
Source: | https://github.com/cran/asymLD |
Computes asymmetric LD measures (ALD) for polymorphic genetic data. These measures are identical to the correlation measure (r) for bi-allelic data.
Package: | asymLD |
Type: | Package |
Version: | 0.1 |
Date: | 2015-03-13 |
License: | GPL-2 |
The function compute.ALD() calculates asymmetric LD for haplotype frequency data. The function compute.AShomz() calculates allele specific homozygosity values for haplotype frequency data.
Thomson G, Single RM. Conditional Asymmetric Linkage Disequilibrium (ALD): Extending the Bi-Allelic r^2 Measure. Genetics. 2014 198(1):321-331. PMID:25023400
A function to compute asymmetric Linkage Disequilibrium measures (ALD) for polymorphic genetic data. These measures are identical to the correlation measure (r) for bi-allelic data.
compute.ALD(dat, tolerance = 0.01)
compute.ALD(dat, tolerance = 0.01)
dat |
A data.frame with 5 required variables (having the names listed below):
|
|||||||||||
tolerance |
A threshold for the sum of the haplotype frequencies. If the sum of the haplotype frequencies is greater than 1+tolerance or less than 1-tolerance an error is returned. The default is 0.01. |
The return value is a dataframe with the following components:
locus1 |
The name of the first locus. |
locus2 |
The name of the second locus. |
F.1 |
Homozygosity (expected under HWP) for locus 1. |
F.1.2 |
Conditional homozygosity* for locus1 given locus2. |
F.2 |
Homozygosity (expected under HWP) for locus 2. |
F.2.1 |
Conditional homozygosity* for locus2 given locus1. |
ALD.1.2 |
Asymmetric LD for locus1 given locus2. |
ALD.2.1 |
Asymmetric LD for locus2 given locus1. |
*Overall weighted haplotype-specific homozygosity for the first locus given the second locus.
A warning message is given if the sum of the haplotype frequencies is greater than 1.01 or less
than 0.99 (regardless of the tolerance
setting). The haplotype frequencies that are passed
to the function are normalized within the function to sum to 1.0 by dividing each frequency by
the sum of the passed frequencies.
library(asymLD) # An example using haplotype frequencies from Wilson(2010) data(hla.freqs) hla.a_b <- hla.freqs[hla.freqs$locus1=="A" & hla.freqs$locus2=="B",] compute.ALD(hla.a_b) hla.freqs$locus <- paste(hla.freqs$locus1, hla.freqs$locus2, sep="-") compute.ALD(hla.freqs[hla.freqs$locus=="C-B",]) # Note: additonal columns on the input dataframe (e.g., "locus" above) are allowed, but # ignored by the function. # An example using genotype data from the haplo.stats package require(haplo.stats) data(hla.demo) geno <- hla.demo[,5:8] #DPB-DPA label <- unique(gsub(".a(1|2)", "", colnames(geno))) label <- paste("HLA*",label,sep="") keep <- !apply(is.na(geno) | geno==0, 1, any) em.keep <- haplo.em(geno=geno[keep,], locus.label=label) hapfreqs.df <- cbind(em.keep$haplotype, em.keep$hap.prob) #format dataframe for ALD function names(hapfreqs.df)[dim(hapfreqs.df)[2]] <- "haplo.freq" names(hapfreqs.df)[1] <- "allele1" names(hapfreqs.df)[2] <- "allele2" hapfreqs.df$locus1 <- label[1] hapfreqs.df$locus2 <- label[2] head(hapfreqs.df) compute.ALD(hapfreqs.df) # Note that there is substantially less variablity (higher ALD) for HLA*DPA1 # conditional on HLA*DPB1 than for HLA*DPB1 conditional on HLA*DPA1, indicating # that the overall variation for DPA1 is relatively low given specific DPB1 alleles # An example using SNP data where results are symmetric and equal to the ordinary # correlation measure (r) data(snp.freqs) snps <- c("rs1548306", "rs6923504", "rs4434496", "rs7766854") compute.ALD(snp.freqs[snp.freqs$locus1==snps[2] & snp.freqs$locus2==snps[3],]) snp.freqs$locus <- paste(snp.freqs$locus1, snp.freqs$locus2, sep="-") by(snp.freqs,list(snp.freqs$locus),compute.ALD) # SNP1 & SNP2 : the r correlation & ALD measures are equivalent due to symmetry for # bi-allelic SNPs p.AB <- snp.freqs$haplo.freq[1] p.Ab <- snp.freqs$haplo.freq[2] p.aB <- snp.freqs$haplo.freq[3] p.ab <- snp.freqs$haplo.freq[4] p.A <- p.AB + p.Ab p.B <- p.AB + p.aB r.squared <- (p.AB - p.A*p.B)^2 / (p.A*(1-p.A)*p.B*(1-p.B)) sqrt(r.squared) #the r correlation measure compute.ALD(snp.freqs[snp.freqs$locus1==snps[1] & snp.freqs$locus2==snps[2],])
library(asymLD) # An example using haplotype frequencies from Wilson(2010) data(hla.freqs) hla.a_b <- hla.freqs[hla.freqs$locus1=="A" & hla.freqs$locus2=="B",] compute.ALD(hla.a_b) hla.freqs$locus <- paste(hla.freqs$locus1, hla.freqs$locus2, sep="-") compute.ALD(hla.freqs[hla.freqs$locus=="C-B",]) # Note: additonal columns on the input dataframe (e.g., "locus" above) are allowed, but # ignored by the function. # An example using genotype data from the haplo.stats package require(haplo.stats) data(hla.demo) geno <- hla.demo[,5:8] #DPB-DPA label <- unique(gsub(".a(1|2)", "", colnames(geno))) label <- paste("HLA*",label,sep="") keep <- !apply(is.na(geno) | geno==0, 1, any) em.keep <- haplo.em(geno=geno[keep,], locus.label=label) hapfreqs.df <- cbind(em.keep$haplotype, em.keep$hap.prob) #format dataframe for ALD function names(hapfreqs.df)[dim(hapfreqs.df)[2]] <- "haplo.freq" names(hapfreqs.df)[1] <- "allele1" names(hapfreqs.df)[2] <- "allele2" hapfreqs.df$locus1 <- label[1] hapfreqs.df$locus2 <- label[2] head(hapfreqs.df) compute.ALD(hapfreqs.df) # Note that there is substantially less variablity (higher ALD) for HLA*DPA1 # conditional on HLA*DPB1 than for HLA*DPB1 conditional on HLA*DPA1, indicating # that the overall variation for DPA1 is relatively low given specific DPB1 alleles # An example using SNP data where results are symmetric and equal to the ordinary # correlation measure (r) data(snp.freqs) snps <- c("rs1548306", "rs6923504", "rs4434496", "rs7766854") compute.ALD(snp.freqs[snp.freqs$locus1==snps[2] & snp.freqs$locus2==snps[3],]) snp.freqs$locus <- paste(snp.freqs$locus1, snp.freqs$locus2, sep="-") by(snp.freqs,list(snp.freqs$locus),compute.ALD) # SNP1 & SNP2 : the r correlation & ALD measures are equivalent due to symmetry for # bi-allelic SNPs p.AB <- snp.freqs$haplo.freq[1] p.Ab <- snp.freqs$haplo.freq[2] p.aB <- snp.freqs$haplo.freq[3] p.ab <- snp.freqs$haplo.freq[4] p.A <- p.AB + p.Ab p.B <- p.AB + p.aB r.squared <- (p.AB - p.A*p.B)^2 / (p.A*(1-p.A)*p.B*(1-p.B)) sqrt(r.squared) #the r correlation measure compute.ALD(snp.freqs[snp.freqs$locus1==snps[1] & snp.freqs$locus2==snps[2],])
A function to compute allele specific homozygosity values for haplotype frequency data. The allele specific homozygosity is the homozygosity statistic computed for alleles at one locus that are found on haplotypes with a specific allele (the focal allele) at the other locus (the focal locus).
compute.AShomz(dat, tolerance = 0.01, sort.var = c("focal", "allele"), sort.asc = rep(TRUE, length(sort.var)))
compute.AShomz(dat, tolerance = 0.01, sort.var = c("focal", "allele"), sort.asc = rep(TRUE, length(sort.var)))
dat |
A data.frame with 5 required variables (having the names listed below):
|
|||||||||||
tolerance |
A threshold for the sum of the haplotype frequencies. If the sum of the haplotype frequencies is greater than 1+tolerance or less than 1-tolerance an error is returned. The default is 0.01. |
|||||||||||
sort.var |
a vector of variable names specifying the "sort by" variables. The default is c("focal","allele"). |
|||||||||||
sort.asc |
a vector of TRUE/FALSE values, with the same length as "sort.var", indicating whether sorting of each variable is in ascending order. The default order is ascending. |
The return value is a dataframe with the following components:
loci |
The locus names separated by "-". |
focal |
The name of the focal locus (locus conditioned on). |
allele |
The name of the focal allele (allele conditioned on). |
allele.freq |
The frequency of the focal allele. |
as.homz |
The allele specific homozygosity (on haplotypes with the focal allele). |
A warning message is given if the sum of the haplotype frequencies is greater than 1.01 or less
than 0.99 (regardless of the tolerance
setting). The haplotype frequencies that are
passed to the function are normalized within the function to sum to 1.0 by dividing each
frequency by the sum of the passed frequencies.
library(asymLD) # An example using haplotype frequencies from Wilson(2010) data(hla.freqs) hla.dr_dq <- hla.freqs[hla.freqs$locus1=="DRB1" & hla.freqs$locus2=="DQB1",] compute.ALD(hla.dr_dq) compute.AShomz(hla.dr_dq, sort.var=c("focal","allele"), sort.asc=c(TRUE,TRUE)) compute.AShomz(hla.dr_dq, sort.var=c("focal","allele.freq"), sort.asc=c(FALSE,FALSE)) # Note that there is substantially less variablity (higher ALD) for HLA*DQB1 # conditional on HLA*DRB1 than for HLA*DRB1 conditional on HLA*DQB1, indicating # that the overall variation for DQB1 is relatively low given specific DRB1 alleles. # The largest contributors to ALD{DQB1|DRB1} are the DRB1*0301 and DRB1*1501 focal # alleles, which have high allele frequencies and also have high allele specific # homozygosity values.
library(asymLD) # An example using haplotype frequencies from Wilson(2010) data(hla.freqs) hla.dr_dq <- hla.freqs[hla.freqs$locus1=="DRB1" & hla.freqs$locus2=="DQB1",] compute.ALD(hla.dr_dq) compute.AShomz(hla.dr_dq, sort.var=c("focal","allele"), sort.asc=c(TRUE,TRUE)) compute.AShomz(hla.dr_dq, sort.var=c("focal","allele.freq"), sort.asc=c(FALSE,FALSE)) # Note that there is substantially less variablity (higher ALD) for HLA*DQB1 # conditional on HLA*DRB1 than for HLA*DRB1 conditional on HLA*DQB1, indicating # that the overall variation for DQB1 is relatively low given specific DRB1 alleles. # The largest contributors to ALD{DQB1|DRB1} are the DRB1*0301 and DRB1*1501 focal # alleles, which have high allele frequencies and also have high allele specific # homozygosity values.
HLA haplotype frequencies for 21 pairs of HLA loci in a set of 300 controls from a study of myopericarditis incidence following smallpox vaccination.
data(hla.freqs)
data(hla.freqs)
A data frame with 3063 observations on the following 5 variables.
haplo.freq
a numeric vector
locus1
a character vector
locus2
a character vector
allele1
a character vector
allele2
a character vector
Wilson, C., 2010 Identifying polymorphisms associated with risk for the development of myopericarditis following smallpox vaccine. The Immunology Database and Analysis Portal (ImmPort), Study #26.
https://immport.niaid.nih.gov/immportWeb/clinical/study/displayStudyDetails.do?itemList=SDY26
A function to sort a data.frame on specific columns.
lsort(dat, by = 1:dim(dat)[2], asc = rep(TRUE, length(by)), na.last = TRUE)
lsort(dat, by = 1:dim(dat)[2], asc = rep(TRUE, length(by)), na.last = TRUE)
dat |
a dataframe or a matrix ("dimnames" are used as the variable names for a matrix) |
by |
a vector or a list of variable names or column indices specifying the "sort by" variables, the default is to sort by all variables in the order they appear in the data set. |
asc |
a vector with the same length as "by" indicating whether the sorting of each "by" variable is in ascending order, the default order is ascending. |
na.last |
a flag indicating whether missing values are placed as the last elements in the data set, the default is TRUE |
The return value is a sorted dataframe.
The input dataframe is not modified. The code is adapted from code posted to an old s-news listserve.
## Not run: library(asymLD) data(snp.freqs) # sort snp.freqs by "locus1" (ascending) and "allele1" (descending) newdata <- lsort(snp.freqs, by=c("locus1","allele1"), asc=c(T,F)) head(newdata) # sort snp.freqs by the fourth and the second variable (ascending) newdata <- lsort(snp.freqs, by=c(4,2)) # sort "snp.freqs" by "locus1" and the 5th variable (ascending) newdata <- lsort(snp.freqs, by=list("locus1",5)) ## End(Not run)
## Not run: library(asymLD) data(snp.freqs) # sort snp.freqs by "locus1" (ascending) and "allele1" (descending) newdata <- lsort(snp.freqs, by=c("locus1","allele1"), asc=c(T,F)) head(newdata) # sort snp.freqs by the fourth and the second variable (ascending) newdata <- lsort(snp.freqs, by=c(4,2)) # sort "snp.freqs" by "locus1" and the 5th variable (ascending) newdata <- lsort(snp.freqs, by=list("locus1",5)) ## End(Not run)
HLA haplotype frequencies for 6 pairs of SNP loci from the de Bakker et al. 2006 data for 90 unrelated individuals with European ancestry (CEU) from the Centre d'Etude du Polymorphisme Humain (CEPH) collection obtained from the Tagger/MHC webpage.
data(snp.freqs)
data(snp.freqs)
A data frame with 20 observations on the following 5 variables.
locus1
a character vector
locus2
a character vector
allele1
a character vector
allele2
a character vector
haplo.freq
a numeric vector
For bi-allelic SNP data the ALD measures are symmetric and equivalent to the r correlation measure of LD.
de Bakker, P. I., G. McVean, P. C. Sabeti, M. M. Miretti, T. Green et al., 2006 A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat.Genet. 38: 1166-1172.
http://www.broadinstitute.org/mpg/tagger/mhc.html