-2

I have raw data with the unique identifier for each unit mixed into the column with the timings. In order to 'summarise' the data I need to attach the uniqueID for each group of rows into the column. Part of my loop has trimmed off the blurb above, then runs an 'ifelse' checking for text, strsplit that to obtain the uniqueID, then paste down until encountering the next text string, and repeat.

It works but it is incredibly slow and I need to repeat it over a lot of raw data. ( and I don't have access to the origin software to change the shape of the output file.)

Reading through the forums has found solutions for replacing with a single variable but I need a method to extract it from a line in the df.

Example df:

       time          dist      v3           v4
1:    2              10.2     ...         ....
2:    3              10.2     ...         ....
3:    Veh: 123     
4:    1              10.2     ...         .... 
5:    2              10.2     ...         ....
6:    3              10.2     ...         ....
7:    Veh: 456   
8:    1              10.2     ...         ....
9:    2              10.2     ...         ....


v <- 0001
for (m in 1:length(k2$time)) {
if(grepl('Veh', k2$time[m])) {v <- strsplit(k2$time[m], split=":")[[1]][2]} else{ k2$time[m]<-v }
                               }

By running it as a loop I know it will work down the column pasting until it encounters another text string. The desired result looking like this.

       time          dist      v3           v4
1:    0001           10.2     ...         ....
2:    0001           10.2     ...         ....
3:    Veh: 123     
4:    123            10.2     ...         .... 
5:    123            10.2     ...         ....
6:    123            10.2     ...         ....
7:    Veh: 456   
8:    456            10.2     ...         ....
9:    456            10.2     ...         ....

I then have another line that runs through the whole data.frame and removes the rows containing text so I can summarise

Is anyone aware of a faster solution, perhaps using dplyr or data.frame? I gave it 15 minutes before aborting a runthrough over 922,000 lines of code and I need it to run over several million.

I'm running out of search combinations on Stack Overflow.

Using data.table-1.9.7 and dplyr-0.5.0 on R-3.3.1


EDIT: Apologies, reproducible example:

time <- c(1,2,"Veh: 123", 1:3,"Veh: 456", 1:3)
dist <- c(1:2,"",4:6,"",8:10)
v3 <- c(1:2,"",4:6,"",8:10)
k <-data.frame(time,dist,v3)
k$time <- as.character(k$time)

v <- 0001
for (m in 1:length(k$time)) {
if(grepl('Veh', k$time[m])) {v <- strsplit(k$time[m], split=":")[[1]][2] }else{ k$time[m]<-v }}
Community
  • 1
  • 1
  • 1
    Hint: `grepl()` and `strsplit()` are both vectorized. That's all I can do without a reproducible example. – Rich Scriven Jul 25 '16 at 20:51
  • [make this a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – shayaa Jul 25 '16 at 20:56

1 Answers1

0
library(data.table)
library(stringr)
time <- c(1,2,"Veh: 123", 1:3,"Veh: 456", 1:3)
dist <- c(1:2,"",4:6,"",8:10)
v3 <- c(1:2,"",4:6,"",8:10)
k <- data.table(time,dist,v3)

v <- 0001
k[,time := ifelse(grepl('Veh: \\d+', time), str_match(time, 'Veh: (\\d+)')[,2], v)]
  • Your 'if' statement is tidier than mine, however see the desired example output above. If does not carry the vehID down into the cells beneath, updating each time it encounters a new 'Veh: xxx' statement. Is this kind of operation efficiently possible in R? Or is my thinking too Excel-ish? – Rodger the Dodger Jul 26 '16 at 02:25