I'm analyzing an RNA-seq dataset where a human cell line has been exposed to multiple chemical compounds at multiple doses. When running QC I have noticed the presence of a batch effect due to the different plates the cells were treated (not a strong one but would like to account for it). I have used both ComBat and removeBatchEffect from the limma package to look if any of the two methods was better in removing the batch but as you can see from the PCAs obtained on the control samples for each of the normalization steps (raw data, vst, ComBat and Limma) it seems that using either of the two methods increase the batch separation.
PCAs on control samples at the different normalization steps to highlight batch effect:
My feeling is that I may have made a mistake when specifying the arguments of the functions but I have come out with this piece of code when looking at similar request on StackOverflow. The code I have used for producing the different normalized dataset on which I run a PCA is:
raw data: raw<-counts(dds, normalized = TRUE)
vst: vst_counts<-vst(dds, blind=TRUE)
ComBat: com<-ComBat(assay(vst_counts),hash$Plate, mod = model.matrix(~1, data = hash))
Limma: lim<-removeBatchEffect(assay(vst_counts),hash$Plate,design=model.matrix(~hash$group))
Raw data and vst data are obtained from DESeq. The hash object is my metadata file containing information about the plates (batch) and the treatment conditions (group).
The code for running the PCA (for a single dataset) was:
ggplot(df_lim) + geom_point(aes(x=PC1, y=PC2, color = Plate),size=3) +
xlab(paste("PC1 (",summary(pca_lim)$importance[2,1]*100,"%)")) +
ylab(paste("PC2 (",summary(pca_lim)$importance[2,2]*100,"%)")) +theme_bw() +
coord_fixed()
Any help in addressing the issue here is highly appreciated.