0

I want to fill in missing data in a column.

Here is an example of the data that I have. The 1st column shows an employee name and the second column correlates to the region that they are working in. Each employee only works in 1 of the locations and does not work in more than one. So in the below example since the non missing values for Bill show East, then Bill works in the East location. My question is how do I replace all the NAs for Bill to the correct region. And how can I do that for each of the other employees as well.

    Name     Region
 1  Bill     East
 2  Bill     East
 2  Bill     NA
 4  Bill     NA
 5  Karen    NA
 6  Karen    South
 7  Karen    NA
 8  Richard  NA 
 9  Richard  West 
10  Richard  NA
11  Richard  West
ZachW
  • 19
  • 1
  • 6
  • Use the `fill` function from __tidyr__. – Edward May 09 '20 at 02:44
  • Another quick comment. I have updated data showing the region that each employee is assigned to. How can I update the df that I have to include the missing information? – ZachW May 09 '20 at 20:14

2 Answers2

0
library(tidyr) 
library(dplyr)

group_by(df, Name) %>% fill(Region, .direction="downup")

# A tibble: 11 x 2
# Groups:   Name [3]
   Name    Region
   <fct>   <fct> 
 1 Bill    East  
 2 Bill    East  
 3 Bill    East  
 4 Bill    East  
 5 Karen   South 
 6 Karen   South 
 7 Karen   South 
 8 Richard West  
 9 Richard West  
10 Richard West  
11 Richard West  
Edward
  • 10,360
  • 2
  • 11
  • 26
  • This unfortunately does not work. If it just copies the previous entry then it has the potential to mislabel things. The data at hand is somewhat messy in that each time someone's name changes it will not necessarily indicate the region. There is not really a specific way in which the region data is filled in. – ZachW May 09 '20 at 18:46
  • It works for your sample data. And it doesn't "just copy the previous entry". Read the help page of `fill`. ;) – Edward May 10 '20 at 01:24
0

Using base R, step-by-step:

data=data.frame(name=c(rep("bill",4),rep("karen",3),rep("richard",4)),region=c("E","E",NA,NA,NA,"S",NA,NA,"W",NA,"W"))
  1. Who are the people:
people=as.character(unique(data[,1]))
people
# [1] "bill"    "karen"   "richard"
  1. Where does each person work? get the first non-NA region:
region=sapply(people,function(p) (data[data[,1]==p & !is.na(data[,2]),2])[1])
region
#    bill   karen richard 
#       E       S       W 
# Levels: E S W
  1. Fill-in the data:
data[,2]=region[data[,1]]
data

Output:

      name region
1     bill      E
2     bill      E
3     bill      E
4     bill      E
5    karen      S
6    karen      S
7    karen      S
8  richard      W
9  richard      W
10 richard      W
11 richard      W