1

I have a csv file of 1.2M rows and need to clean up multiple columns in the file. Basically remove items after an underscore in the 7 columns that need to be cleaned. The following code works takes 3 days to complete. I have a Perl script which does the same in 30 seconds but trying to keep everything within R if possible. Any suggestions?

v<-c(11:14,16:18)

systime<-Sys.time()
for(m in 1:length(v)){
  for(i in 1:nrow(shots[,v[m]])){
    shots[i,v[m]]<-unlist(strsplit(shots[i,v[m]]    [[1]],split='_',fixed=TRUE))[1]
   }
}

Need to remove the data after the underscore.

The following are the column names showing some of the data needed and what needs to be kept and removed in particular columns.
1 42.30000 586.39276 Ground Name Name1 KEEP_remove KEEP_remove_remove No Mount KEEP_remove_remove 1 1

  • 1
    `shots[,v]<- sub('_.*', '', shots[,v])` ? – Sotos Jun 09 '16 at 13:57
  • 1
    Possibly `shots[v] <- lapply(shots[v], function(x) sub("^([^_]+).*", "\\1", x))` – talat Jun 09 '16 at 13:58
  • Would there be another suggestion in cleaning up the data instead of using the method already coded? – user5292535 Jun 09 '16 at 14:50
  • @user5292535, it would be best for you and for those who want to answer your question if you provide a minimal reproducible example and the expected output for the example. See [mcve] and [how to make a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – talat Jun 09 '16 at 14:57
  • The following are the column names showing some of the data needed and what needs to be kept and removed in particular columns. Does this help? They have column headings which are not included here. 1 42.30000 586.39276 Ground Name Name1 KEEP_remove KEEP_remove_remove No Mount KEEP_remove_remove 1 1 – user5292535 Jun 09 '16 at 15:48
  • You have to put such information into the body of your question instead of comments. Use the edit button to do so. – talat Jun 09 '16 at 17:04

0 Answers0