1

I have a dataset that looks like this:

library(purrr)
library(dplyr) 
temp<-as.data.frame(cbind(col_A<-c(1,2,NA,3,4,5,6),col_B<-c(NA,1,2,NA,1,NA,NA)))
names(temp)<-c("col_A","col_B")
col_A      col_B
 1         NA           
 2         1           
 NA        2
 3         NA
 4         3
 5         NA
 6         NA

I want to create a new dataframe which contains the count of non NA items for each column. Like the following example:

count_A      count_B
 1           0           
 2           1           
 0           2
 1           0
 2           1
 3           0
 4           0

I am strugling in getting the count of items. My closest approximation is this:

count_days<-function(prev,new){
ifelse(!is.na(new),prev+1,0)
}

temp[,"col_A"] %>% 
mutate(count_a=accumulate(count_a,count_days))

But I get the following error:

Error in UseMethod("mutate_") : 
   no applicable method for 'mutate_' applied to an object of class "c('double', 'numeric')"

Can anyone help me with this code or just give me another glance.

I know this piece of code just tries to count, not creating the new df, which I think is easier after I get the correct result.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
arodrisa
  • 300
  • 4
  • 17

3 Answers3

4

Using rle in a (somewhat nested) lapply approach. We first list if an element of the data is.na. Then, using rle we decode values and lengths. Those lengths which are NA we set to 0 by multiplication and unlist the thing.

res <- as.data.frame(lapply(lapply(temp, is.na), function(x) {
  r <- rle(x)
  s <- sapply(r$lengths, seq_len)
  s[r$values] <- lapply(s[r$values], `*`, 0)
  unlist(s)
}))
res
#   col_A col_B
# 1     1     0
# 2     2     1
# 3     0     2
# 4     1     0
# 5     2     1
# 6     3     0
# 7     4     0
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Thanks for the answer. I'm trying to understand the code, and I can't understand how this line "s[r$values] <- lapply(s[r$values], `*`, 0)" gives the count. – arodrisa May 16 '20 at 19:23
  • 1
    The `s` is a list which we want to replace with the same class (list), but only where `r$values == TRUE` (i.e. where there are `NA` in `temp`). These values we want to set to (multiply by) `0`. `lapply(s[r$values], `*`, 0)` does exactly this by looping over `s` subsetted by `r$values == TRUE`. We just write `r$values`, since the ` == TRUE` is redundant. Does this make sense for you? You could actually examine the code inside the curly brackets line by line by doing `x <- lapply(temp, is.na)[[1]]`. – jay.sf May 16 '20 at 20:33
  • 1
    OK, now it makes sense, I was missdebugging. Thanks for the answer and the clariffication! – arodrisa May 16 '20 at 20:46
1

We can use rleid from data.table

library(data.table)
setDT(temp)[, lapply(.SD, function(x) rowid(rleid(!is.na(x))) * !is.na(x))]
#    col_A col_B
#1:     1     0
#2:     2     1
#3:     0     2
#4:     1     0
#5:     2     1
#6:     3     0
#7:     4     0
akrun
  • 874,273
  • 37
  • 540
  • 662
0
library(tidyverse)

You can use sequence and rle from data.table First set all non-NA as 1 and then rle count the sequence of same numbers

library(data.table)

temp %>% 
  replace(.,!is.na(.),1) %>% 
  mutate(col_A=case_when(!is.na(col_A)~sequence(rle(col_A)$lengths))) %>% 
  mutate(col_B=case_when(!is.na(col_B)~sequence(rle(col_B)$lengths))) %>% 
  replace(.,is.na(.),0)