I'm trying to go through and print individual PDF files of violin plots for many distinct data sets from individual files in a directory. I've created a series of for loops to go through them:
library(ggplot2)
library(tidyr)
library(hmisc)
myfiles<-dir()
plot_list=list()`
for (i in 1:length(myfiles)){
dfx<-read.table(file=myfiles[i], header=TRUE, sep="\t", quote="")
dfx %>% gather("gene","expression",2:8)
#dfx2<-gather(dfx, "gene","expression",2:8)
p<-ggplot(dfx,aes(x=gene,y=expression, fill=gene)) +
geom_violin(scale="width", trim=FALSE) +
stat_summary(fun.data="mean_sdl", mult=1, geom="pointrange",
color="black", size=0.3)
plot_list[[i]]=p
}
However, I keep getting the following error after the first for loop: Error: position must be between 0 and n
.
By tampering with it, it looks like it's an issue with the gather function and not recognizing the second through eighth columns to turn into key-value pairs. Can anyone provide some insight as to why this is happening?
Of note, when I test the script out of the for loop on one file, my dataframe (dfx) looks as such, prior to the gather function:
>head(dfx, n=6L)
Sample A3A A3B A3C
1 00507d23-fbf3-4363-beff-aea03f9c5d2b 0.03121353 0.30252324 4.152817
2 008b8100-7bd6-4224-998c-700863de51da 0.03029060 0.12682751 1.783519
3 00bf9b15-1ee8-4083-aeca-7b01e2ebbf72 0.02288048 0.09821837 1.198759
4 030890e1-dcc7-4a16-9ff3-a7bfd259b471 0.14018837 0.25924818 2.843870
5 03248d19-cb6a-4578-9759-c0c4de048920 0.05629487 0.14414294 2.370515
6 03bc1d49-07fe-41ec-8064-28861c25eebb 0.02869719 0.13016301 3.834980
A3D A3F A3G A3H
1 0.49064339 0.5746080 1.36810941 0.33271714
2 0.03835540 0.1835935 0.14274570 0.04757876
3 0.02461852 0.1755424 0.03669695 0.04730084
4 0.19313735 0.5151350 1.00295535 0.20449874
5 0.34363224 0.2372394 0.39013512 0.08738450
6 0.19863243 0.4579626 0.47219715 0.10500037
And after the gather function (again not in the for loop):
>head((dfx %>% gather("gene","expression",2:8),n=6L)
Sample gene expression
1 00507d23-fbf3-4363-beff-aea03f9c5d2b A3A 0.03121353
2 008b8100-7bd6-4224-998c-700863de51da A3A 0.03029060
3 00bf9b15-1ee8-4083-aeca-7b01e2ebbf72 A3A 0.02288048
4 030890e1-dcc7-4a16-9ff3-a7bfd259b471 A3A 0.14018837
5 03248d19-cb6a-4578-9759-c0c4de048920 A3A 0.05629487
6 03bc1d49-07fe-41ec-8064-28861c25eebb A3A 0.02869719
Of another note, in a brief attempt, I've also looked at a similar question: gather with tidyr: position must be between 0 and n error and edited the line dfx %>% gather("gene","expression",2:8) to dfx %>% gather("gene","expression",c(2:8)) but to no avail.