1

I am trying to move some old code from a data frame implementation to data table. Initially I obtain my data from a .csv file, where some cells contain arrays which are converted into character strings by fread, like so:

> mydata$sport[1]
[1] "[24, 18, 24, 18]"

I want to parse these strings into numeric arrays. Here's what I've got partly working as a first step (to get rid of the brackets, step 2, not shown here, is to convert to a numeric array):

> name = "ascent"
> paste0(name, ":=strsplit(gsub('^\\[|\\]$','',", name, "),',')")
[1] "ascent:=strsplit(gsub('^\\[|\\]$','',ascent),',')"
 #here I manually copy the result of paste0 into the datatable command
 #I want to automate this setup, so this all can be put in a for loop
 #for many names
> mydata[, ascent:=strsplit(gsub('^\\[|\\]$','',ascent),',')]
> mydata$ascent[10]
[[1]]
[1] "-999"  " -999"

So the command I generate to make the modification is good, but I have many names I want to do this for, so I don't want to copy and paste by hand, as is necessary above. I tried using the eval trick discussed here dynamic column names in data.table, R

But once I introduce eval the code doesn't work:

> name = "ascent"
> mydata[, eval(paste0(name, ":=strsplit(gsub('^\\[|\\]$','',", name, "),',')"))]
[1] "ascent:=strsplit(gsub('^\\[|\\]$','',ascent),',')"

So how can I implement this to work for an arbitrary name without having to create a command by hand for each desired name via paste0? I have an entire vector of names where I would like to do this modification.

Here's the data table right after fread and before making any modifications:

> mydata[1:10, .(sport, ascent)]
                             sport                                                       ascent
 1:               [24, 18, 24, 18]                                   [-999, 140.0, -999, 140.0]
 2: [2, 2, 2, 22]                                                    [-999, -999, -999, -999]
 3:       [-999, -999, -999, -999]                                     [-999, -999, -999, -999]
 4:                   [-999, -999]                                               [173.0, 173.0]
 5:                       [18, 18]                                                 [-999, -999]
 6:                         [-999]                                                       [-999]
 7:                         [-999]                                                       [-999]
 8:                         [-999]                                                       [-999]
 9:                   [-999, -999]                                                 [-999, -999]
10:                   [-999, -999]                                                 [-999, -999]
Community
  • 1
  • 1
sunny
  • 3,853
  • 5
  • 32
  • 62
  • You don't have to code that way to remove brackets for all columns. You can just `lapply(mydata, function(x) gsub('^\\[|\\]$','', x))` – Pierre L Jul 11 '15 at 00:53

1 Answers1

1

Don't use the names at all...

for(j in which(names(mydata) %in% names)) set(mydata,i=NULL,j=j,value=strsplit(gsub('^\\[|\\]$','',mydata[[j]]),','))

As an aside eval needs parse to work the way you were trying to use it, for example eval(parse(text=paste0(name,":=1+1")))

Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72
  • thanks for the suggestion, but since I don't want to perform this action on all the columns and it would be awkward to have to track which column numbers rather than which column names I wanted to refer to, I don't think this solution is workable. – sunny Jul 15 '15 at 20:35
  • see edit. This should work for vector called `names` only. – Dean MacGregor Jul 15 '15 at 20:41
  • @sunny does the edit solve your problem? If so would you mind accepting the answer? – Dean MacGregor Jul 22 '15 at 20:18
  • yes the edit did solve the problem. I will accept your answer, though I am still hoping to find a way to avoid for loops. – sunny Jul 22 '15 at 20:19
  • actually if you look at the comments to the answer of this question http://stackoverflow.com/questions/16846380/how-to-apply-same-function-to-every-specified-column-in-a-data-table you'll find the creators of data.table prefer a for loop for this type of thing. – Dean MacGregor Jul 22 '15 at 20:26