0

I have a function to label as spam strings in a datasets. I use this function with success by calling:

dtm_english.label <- getSpamLabel(comment$rawMessage, dictionary_english, 2) # 2 is the threshold level

But then when I call

dtm_english.label <- ddply(comment, .(rawMessage), getSpamLabel, dictionary_english, 2, .progress = "text")

after ddply completes without any output the task I get

Error in do.call("c", res) : variable names are limited to 10000 bytes

I can post the function if relevant

CptNemo
  • 6,455
  • 16
  • 58
  • 107
  • The function is indeed relevant. We also need some example data to reproduce the error. Please read [this FAQ](http://stackoverflow.com/a/5963610/1412059). – Roland Sep 03 '13 at 10:54

1 Answers1

2

I am not sure what you are attempting to do, next time please describe exactly what you are trying to achieve. To me it looks like you are trying to apply a function to one column of your data.frame. ddply is meant to be used to apply a function to subsets of the data. It is described as "Split data frame, apply function, and return results in a data frame".

If what you want to do is split your column into sections before applying the function, you would need for example a factor in your dataframe to tag the groups.

You would use the "group" factor in the .variable argument to ddply, not the variable to which you would like to apply the function, FUN=summarize, and then your function call.

dtm_english <- ddply(comment, .(group), summarize, 
                     label=getSpamLabel(rawMessage, dictionary_english, 2), 
                     .progress = "text")

This will give as output a new dataframe with a row for each level of group.

Pedro J. Aphalo
  • 5,796
  • 1
  • 22
  • 23