-6

For research purposes I need to process data from csv table. The table looks like the following:

    Frame Nr. 0      frame_type  I_frame
    Frame Nr. 1      frame_type  P_frame
    Frame Nr. 2      frame_type  P_frame
    Frame Nr. 3      frame_type  B_frame
    Frame Nr. 4      frame_type  P_frame
    Frame Nr. 5      frame_type  P_frame
    Frame Nr. 6      frame_type  B_frame
    Frame Nr. 7      frame_type  P_frame
    Frame Nr. 8      frame_type  P_frame
    Frame Nr. 9      frame_type  I_frame
    Frame Nr. 10     frame_type  P_frame
    Frame Nr. 11     frame_type  P_frame
    Frame Nr. 12     frame_type  P_frame
    Frame Nr. 13     frame_type  I_frame
    Frame Nr. 14     frame_type  P_frame
    Frame Nr. 15     frame_type  P_frame
    Frame Nr. 16     frame_type  B_frame
    Frame Nr. 17     frame_type  P_frame
    Frame Nr. 18     frame_type  P_frame
    Frame Nr. 19     frame_type  P_frame
    Frame Nr. 20     frame_type  P_frame
    Frame Nr. 21     frame_type  I_frame
    Frame Nr. 22     frame_type  P_frame
    Frame Nr. 23     frame_type  P_frame
    Frame Nr. 24     frame_type  P_frame
    Frame Nr. 25     frame_type  I_frame
    ...

I want R to firstly group frames starting with each I_frame and end up with another I_frame calculating the sum of p-frames and b-frames. In this example, my R program should deliver a result like the following:

I2PB2PB2P I3P I2PB4P I3P ...

Is there a way in R to do that?

Kindermann
  • 403
  • 1
  • 7
  • 17
  • 2
    There is highly probably a way to achieve this (@akrun showed you already one way), but at the moment it is not very clear where the desired result comes from. For some guidance on how to improve your question, see the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap Sep 26 '16 at 15:01
  • Shouldn't the first set of string be `"I2PB2PB2P"` ? – akrun Sep 26 '16 at 15:10
  • 1
    I guess you may do `grp <- cumsum(df1[,3]=="I_frame");unname(tapply(df1[,3], grp, FUN= function(x) {rl <- rle(sub("_.*", "", x)); paste(ifelse(rl$lengths>1, rl$lengths, ""), rl$values, collapse="", sep="") }))` – akrun Sep 26 '16 at 16:06
  • @akrun You're right. I made a mistake ;-) – Kindermann Sep 27 '16 at 09:09
  • @ProcrastinatusMaximus yes, i understand what you meant. I just wanted a direction to begin with :-) – Kindermann Sep 28 '16 at 11:01

1 Answers1

1

Editing from previous wrong answer and borrowing from @akron for the use of rle, you can do this: assuming that your data is in a data.frame named "df" and your "frame classes" are in a column named "frame_class", as in the code below, this should work:

df = data.frame(n_frame = seq(1:13), frame_type = "frame_type",
                frame_class = c("I_frame", "P_frame", "P_frame", "B_frame", "P_frame", "P_frame",
                                "B_frame", "I_frame", "B_frame", "P_frame", "I_frame", "P_frame", "I_frame"))
df$frame_letter = substring(df$frame_class,1,1) # get only the beginning letter

# Find the location of I_frames
where_i = which(df$frame_class == "I_frame") 
num_i = length(where_i)
out_codes = list()

for (ind_i in 1:(num_i-1)){ # cycle on "sandwiches"
  start = where_i[ind_i]
  end = where_i[ind_i+1]
  sub_data = df$frame_letter[(start+1):(end-1)]  # Get data in a sandwich
  count_reps = rle(sub_data)  # find repetitions pattern

  # build the codes
  out_code = "I"
  for (ind_letter in 1:length(count_reps$lengths)){
    out_code= paste0(out_code, ifelse(count_reps$lengths[ind_letter] == 1, 
                     count_reps$values[ind_letter],  # If only 1 rep, don't add "1" in the string
                     paste0(count_reps$lengths[ind_letter], count_reps$values[ind_letter]))) 
  }
  out_codes [[ind_i]] = out_code # put in list
}
out_codes

, which gives:

> out_codes
[[1]]
[1] "I2PB2PB"

[[2]]
[1] "IBP"

[[3]]
[1] "IP"

note it's really quick and dirty: you should at least want to implement some checks to be sure that the series always start and end with an "I_frame", but this could put you in the right direction...

Also note that this could be slow for large datasets.

Lorenzo

lbusett
  • 5,801
  • 2
  • 24
  • 47