1

I have a table like this

  Id   A    B     
  1  11   event1  
  2  12   event2   
  3  00   event1
  4  00   event2
  5  11   event1
  6  00   event2
  7  00   event3
  8  13   event1

I want to change 00 from the previous values. The output will look like this

  Id  A   B 
  1  11   event1
  2  12   event2
  3  12   event1
  4  12   event2
  5  11   event1
  6  11   event2
  7  11   event3
  8  13   event1

Is there a way to do this in R or MySQL. Thanks

jogo
  • 12,469
  • 11
  • 37
  • 42
No_body
  • 832
  • 6
  • 21

3 Answers3

1

Here is a one-liner to do this in R. Basically, we replace 0s with NA and use zoo::na.locf() to fill with the last non-NA value, i.e.

transform(d1, A = zoo::na.locf(replace(A, A == 0, NA)))

which gives,

  Id  A      B
1  1 11 event1
2  2 12 event2
3  3 12 event1
4  4 12 event2
5  5 11 event1
6  6 11 event2
7  7 11 event3
8  8 13 event1

As @G. Grothendieck notes, your A variable seems to be a character. If that's the case then,

transform(d1, A = na.locf0(replace(A, A == "00", NA)))
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • It's not jiust that `A` is character but also that `na.locf0` applied to a vector will ensure that the result is the same length as the input. Also we only need to apply it to `A` and not the entire data.frame. – G. Grothendieck Nov 20 '18 at 15:14
  • Ahh...right. I did not even notice that I apply it on the whole data frame. Also about `na.locf0` ,..., when does `na.locf` produce a result with different length than the input? – Sotos Nov 20 '18 at 15:19
  • If there are leading NAs then `na.locf` will drop them unless `na.rm = FALSE` is used. `na.locf0` hard codes to `na.rm = FALSE` but only works on vectors. `na.locf` was really developed for zoo objects where you usually don't need filling but if you want to apply it to a data.frame where filling is normally needed then you have to worry about such things. – G. Grothendieck Nov 20 '18 at 15:21
  • Oh, I get it. Great. Thank you for the info @G.Grothendieck – Sotos Nov 20 '18 at 15:28
1

You can use correlated subquery in MySQL :

SELECT t.id, 
       (CASE WHEN A <> '00' 
             THEN A
             ELSE (SELECT t1.A
                   FROM table t1
                   WHERE t1.id < t.id AND t1.A <> '00'
                   ORDER BY t1.id DESC
                   LIMIT 1
                  )
        END) AS A, B
FROM table t;
Yogesh Sharma
  • 49,870
  • 5
  • 26
  • 52
  • Is there a more optimized way to do it. I am running it on 80 M rows and it's taking forever to run. Thanks – No_body Nov 27 '18 at 16:28
0

In R you can do this with the tidyr function fill. First change all the 0 to NA with mutate, then the fill function will replace it with the last value.

library(dplyr)
library(tidyr)
df %>% mutate(A = ifelse(A==0,NA,A)) %>% fill(A)
#  Id  A      B
#1  1 11 event1
#2  2 12 event2
#3  3 12 event1
#4  4 12 event2
#5  5 11 event1
#6  6 11 event2
#7  7 11 event3
#8  8 13 event1
jasbner
  • 2,253
  • 12
  • 24