0

This question is almost the same as a previous question, but differs enough that the answers for that question don't work here. Like @chase in the last question, I want to write out multiple files for each split of a dataframe in the following format(custom fasta).

#same df as last question

df <- data.frame(
    var1 = sample(1:10, 6, replace = TRUE)
    , var2 = sample(LETTERS[1:2], 6, replace = TRUE)
    , theday = c(1,1,2,2,3,3)
)    

#how I want the data to look
write(paste(">", df$var1,"_", df$var2, "\n", df$theday, sep=""), file="test.txt")

#whole df output looks like this:
#test.txt
>1_A 
1
>8_A
1
>4_A 
2
>9_A 
2
>2_A 
3
>1_A 
3

However, instead of getting the output from the entire dataframe I want to generate individual files for each subset of data. Using d_ply as follows:

d_ply(df, .(theday), function(x) write(paste(">", df$var1,"_", df$var2, "\n", df$theday, sep=""), file=paste(x$theday,".fasta",sep="")))

I get the following output error:

Error in file(file, ifelse(append, "a", "w")) : 
  invalid 'description' argument
In addition: Warning messages:
1: In if (file == "") file <- stdout() else if (substring(file, 1L,  :
  the condition has length > 1 and only the first element will be used
2: In if (substring(file, 1L, 1L) == "|") { :
  the condition has length > 1 and only the first element will be used

Any suggestions on how to get around this?

Thanks, zachcp

Community
  • 1
  • 1
zach
  • 29,475
  • 16
  • 67
  • 88

1 Answers1

3

There were two problems with your code.

  • First, in constructing the file name, you passed the vector x$theday to paste(). Since x$theday is taken from a column of a data.frame, it often has more than one element. The error you saw was write() complaining when you passed several file names to its file= argument. Using instead unique(x$theday) ensures that you will only ever paste together a single file name rather than possibly more than one.

  • Second, you didn't get far enough to see it, but you probably want to write the contents of x (the current subset of the data.frame), rather than the entire contents of df to each file.

Here is the corrected code, which appears to work just fine.

d_ply(df, .(theday), 
    function(x) {write(paste(">", x$var1,"_", x$var2, "\n", x$theday, sep=""), 
                       file=paste(unique(x$theday),".fasta",sep=""))
    })
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • @zach -- Thanks. I right away put a `browser()` call in the body of the anonymous function (i.e. `function(x) {browser(); write.....}`), which makes debugging easy. By doing that, you're able to look around and examine each of the pieces of the environment and calculations at your leisure, and can quickly see what's gone awry. – Josh O'Brien Jan 24 '12 at 23:18