0

I'm trying to go through and print individual PDF files of violin plots for many distinct data sets from individual files in a directory. I've created a series of for loops to go through them:

library(ggplot2)
library(tidyr)
library(hmisc)

myfiles<-dir()
plot_list=list()`

for (i in 1:length(myfiles)){
     dfx<-read.table(file=myfiles[i], header=TRUE, sep="\t", quote="")
     dfx %>% gather("gene","expression",2:8)
     #dfx2<-gather(dfx, "gene","expression",2:8)
     p<-ggplot(dfx,aes(x=gene,y=expression, fill=gene)) + 
     geom_violin(scale="width", trim=FALSE) + 
     stat_summary(fun.data="mean_sdl", mult=1, geom="pointrange", 
     color="black", size=0.3)
     plot_list[[i]]=p
}

However, I keep getting the following error after the first for loop: Error: position must be between 0 and n.

By tampering with it, it looks like it's an issue with the gather function and not recognizing the second through eighth columns to turn into key-value pairs. Can anyone provide some insight as to why this is happening?

Of note, when I test the script out of the for loop on one file, my dataframe (dfx) looks as such, prior to the gather function:

>head(dfx, n=6L)

                                Sample        A3A        A3B      A3C
1 00507d23-fbf3-4363-beff-aea03f9c5d2b 0.03121353 0.30252324 4.152817
2 008b8100-7bd6-4224-998c-700863de51da 0.03029060 0.12682751 1.783519
3 00bf9b15-1ee8-4083-aeca-7b01e2ebbf72 0.02288048 0.09821837 1.198759
4 030890e1-dcc7-4a16-9ff3-a7bfd259b471 0.14018837 0.25924818 2.843870
5 03248d19-cb6a-4578-9759-c0c4de048920 0.05629487 0.14414294 2.370515
6 03bc1d49-07fe-41ec-8064-28861c25eebb 0.02869719 0.13016301 3.834980
         A3D       A3F        A3G        A3H
1 0.49064339 0.5746080 1.36810941 0.33271714
2 0.03835540 0.1835935 0.14274570 0.04757876
3 0.02461852 0.1755424 0.03669695 0.04730084
4 0.19313735 0.5151350 1.00295535 0.20449874
5 0.34363224 0.2372394 0.39013512 0.08738450
6 0.19863243 0.4579626 0.47219715 0.10500037

And after the gather function (again not in the for loop):

>head((dfx %>% gather("gene","expression",2:8),n=6L)

                                Sample gene expression
1 00507d23-fbf3-4363-beff-aea03f9c5d2b  A3A 0.03121353
2 008b8100-7bd6-4224-998c-700863de51da  A3A 0.03029060
3 00bf9b15-1ee8-4083-aeca-7b01e2ebbf72  A3A 0.02288048
4 030890e1-dcc7-4a16-9ff3-a7bfd259b471  A3A 0.14018837
5 03248d19-cb6a-4578-9759-c0c4de048920  A3A 0.05629487
6 03bc1d49-07fe-41ec-8064-28861c25eebb  A3A 0.02869719

Of another note, in a brief attempt, I've also looked at a similar question: gather with tidyr: position must be between 0 and n error and edited the line dfx %>% gather("gene","expression",2:8) to dfx %>% gather("gene","expression",c(2:8)) but to no avail.

pogibas
  • 27,303
  • 19
  • 84
  • 117
Matt
  • 137
  • 1
  • 1
  • 12
  • Please see [how to create a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Verify where the error is by calling `traceback()` after you get the error to see the last call stack and share that here. Are you sure all your files have 8 columns? – MrFlick Sep 05 '17 at 21:32
  • 1. Have you tried commenting your code and finding exact line where error occurs?; 2. Have you tried running code only on one sample (don't loop before you make sure that it works)?; 3. If it's `gather` error you need to post example of `dfx`. – pogibas Sep 05 '17 at 21:33
  • If you think the problem is at the `gather` line, could you edit your question to include the output of `dput(dfx)`? We might be able to help, then. – lebelinoz Sep 05 '17 at 21:36
  • Possible duplicate of [gather with tidyr: position must be between 0 and n error](https://stackoverflow.com/questions/32512501/gather-with-tidyr-position-must-be-between-0-and-n-error) – pogibas Sep 05 '17 at 21:39
  • @PoGibas: 1. I tried commenting the code including the line with the `gather` function and incorporated printing the first three lines of each data file. So the files are able to be read into dataframes and all have 8 columns. 2. I have tried on one sample, not in a for loop but using the same above methods and it works fine. – Matt Sep 05 '17 at 21:55
  • 1
    @Matt if it works on one file, then it might be that one of your file is corrupt and has different structure. Try `print(i); gather(...)` to see which sample might be bad. – pogibas Sep 05 '17 at 22:03
  • @PoGibas, you're completely right. There was a corrupt file in my list. Thanks for the help and for making me realize a simple answer to this. Stupid mistake on my end. I can go ahead and delete this thread. Although, as a novice on here, would y'all recommend I maintain it for whatever learning purpose? Thanks again for the input and guidance! – Matt Sep 05 '17 at 22:50
  • @Matt post your own answer (debugging) for other users to know :-) – pogibas Sep 05 '17 at 22:52

0 Answers0