3

Suppose I have dataset which looks like

ID Name 
1  JAY
1  
1  JAY
2  LAY
2  LAY
2  
3  NA
3  KAY
3  

I want to fill the rows with missing values(empty or with NA) based on the observation already available in the group. So the resultant data frame will look like

ID Name 
1  JAY
1  JAY
1  JAY
2  LAY
2  LAY
2  LAY
3  KAY
3  KAY
3  KAY

I tried using na.locf but it didnt work for non numeric value.

DF1 = setDT(DF)[,  N := na.locf(na.locf(Name(NA_real_^!Name),na.rm=FALSE), fromLast=TRUE, na.rm=FALSE), ID][is.na(N), N := 0]
Jay khan
  • 745
  • 2
  • 9
  • 22

4 Answers4

4

One option is after grouping by 'ID', subset the 'Name' that are not NA and not a blank (nzchar(Name)), get the last observation (tail(...)) and assign (:=) it to 'Name'.

setDT(DF)[, Name := tail(Name[!is.na(Name) & nzchar(Name)], 1), by = ID]
DF
#   ID Name
#1:  1  JAY
#2:  1  JAY
#3:  1  JAY
#4:  2  LAY
#5:  2  LAY
#6:  2  LAY
#7:  3  KAY
#8:  3  KAY
#9:  3  KAY

If the 'Name' column is factor change the nzchar(Name) to nzchar(as.character(Name))


Or specify the logical vector in the 'i' and assign (:=) the last observation (Name[.N]) to 'Name' after grouping by 'ID'

setDT(DF)[!is.na(Name) & nzchar(Name), Name := Name[.N], ID]

Note: For the second solution to work, the 'Name' should be character class.

akrun
  • 874,273
  • 37
  • 540
  • 662
3

Solution in base R (using split and do.call(bind, ...). Assume d contains your dataframe:

tmp <- lapply(split(d, d$ID), function(x) { 
    # Explanation:
    # decreasing = TRUE so that empty strings are at the end
    # na.last = NA so that NA's are omitted
    x$Name <- sort(x$Name, decreasing = TRUE, na.last = NA)[1];
    return(x);
})

d.new <- do.call(rbind, tmp);

print(d.new);
ID Name
1.1  1  JAY
1.2  1  JAY
1.3  1  JAY
2.4  2  LAY
2.5  2  LAY
2.6  2  LAY
3.7  3  KAY
3.8  3  KAY
3.9  3  KAY
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
1

Here is an another solution for the dplyr enthusiasts, use fill() from the tidyrpackage. Specify the direction in which you want to fill the missing names c("down", "up", "downup", "updown").

df %>% group_by(ID) %>% fill(Name, .direction = "updown")
Amleto
  • 584
  • 1
  • 7
  • 25
0

na.locf works on text vectors, I think your data.table syntax might be messing with it. I haven't used that package much so can't say how.

This works fine:

df <- data.frame('ID' = c(1,1,1,2,2,2,3,3,3),
                 'Name' = c('JAY', '', 'JAY', 'LAY', 'LAY', '', NA, 'KAY', ''),
                 stringsAsFactors = FALSE)

df$Name <- na.locf(df$Name, fromLast = TRUE) # takes care of 'KAY'    

df[df==''] <- NA

df$Name <- na.locf(df$Name) # takes care of the rest

The base R solution already posted is also very good, splitting by ID gives you more control over how replacements happen. I've done something similar with by() before.

obrl_soil
  • 1,104
  • 12
  • 24