Is there a way to fill in missing data in a column in R

Question

I want to fill in missing data in a column.

Here is an example of the data that I have. The 1st column shows an employee name and the second column correlates to the region that they are working in. Each employee only works in 1 of the locations and does not work in more than one. So in the below example since the non missing values for Bill show East, then Bill works in the East location. My question is how do I replace all the NAs for Bill to the correct region. And how can I do that for each of the other employees as well.

    Name     Region
 1  Bill     East
 2  Bill     East
 2  Bill     NA
 4  Bill     NA
 5  Karen    NA
 6  Karen    South
 7  Karen    NA
 8  Richard  NA 
 9  Richard  West 
10  Richard  NA
11  Richard  West

Another quick comment. I have updated data showing the region that each employee is assigned to. How can I update the df that I have to include the missing information? — ZachW, May 09 '20 at 20:14

score 0 · Answer 1 · answered May 09 '20 at 02:46

0

library(tidyr) 
library(dplyr)

group_by(df, Name) %>% fill(Region, .direction="downup")

# A tibble: 11 x 2
# Groups:   Name [3]
   Name    Region
   <fct>   <fct> 
 1 Bill    East  
 2 Bill    East  
 3 Bill    East  
 4 Bill    East  
 5 Karen   South 
 6 Karen   South 
 7 Karen   South 
 8 Richard West  
 9 Richard West  
10 Richard West  
11 Richard West

answered May 09 '20 at 02:46

Edward

10,360
2
11
26

This unfortunately does not work. If it just copies the previous entry then it has the potential to mislabel things. The data at hand is somewhat messy in that each time someone's name changes it will not necessarily indicate the region. There is not really a specific way in which the region data is filled in. – ZachW May 09 '20 at 18:46
It works for your sample data. And it doesn't "just copy the previous entry". Read the help page of `fill`. ;) – Edward May 10 '20 at 01:24

score 0 · Answer 2 · answered May 09 '20 at 20:03

Using base R, step-by-step:

data=data.frame(name=c(rep("bill",4),rep("karen",3),rep("richard",4)),region=c("E","E",NA,NA,NA,"S",NA,NA,"W",NA,"W"))

Who are the people:

people=as.character(unique(data[,1]))
people
# [1] "bill"    "karen"   "richard"

Where does each person work? get the first non-NA region:

region=sapply(people,function(p) (data[data[,1]==p & !is.na(data[,2]),2])[1])
region
#    bill   karen richard 
#       E       S       W 
# Levels: E S W

Fill-in the data:

data[,2]=region[data[,1]]
data

Output:

      name region
1     bill      E
2     bill      E
3     bill      E
4     bill      E
5    karen      S
6    karen      S
7    karen      S
8  richard      W
9  richard      W
10 richard      W
11 richard      W

Is there a way to fill in missing data in a column in R

2 Answers2