0

My problem is: I want to keep the observation column from data frame y when I join the two. In order to reference it back to the the original data frame. right now when I perform a left_join() I get null values for the observations. The column in data fame y is named "Obs"

I have already tried using different types of join or rearranging the x and y data frames

Simple Example of what I am trying to do:

x = data.frame(fun =c("cool", "neat" , "awesome", "neat1", "amazing", "sweet"), address = c("100", "1100", "99", "900", "55", "200"), state = c("IL", "CO", "MO", "CA", "MA", "TX"), date = c(12,3,4, 6, 8, 9)) 
y = data.frame(fun =c("cool", "neat" , "awesome", "super"), address = c("100", "1100", "99","55"), state = c("IL", "CO", "MO", "MA"), status = c(T,F,T, T))

y$Obs = 1:nrow(y)


x %>% left_join(y, by =c("address", "state")) 

For some reason the above sample code works with showing the observations however, when I run this on my actual data sets with data frame x having about 18000 records and data frame y having 2100 records. I get all NA values for the observations. Even though they are matching based on state and address.

Expected is I have the new joined data frame with a observation column that are referenced back(the same) to data frame y. When I run it I get all NA values for Obs

Jace
  • 76
  • 6
  • 4
    Can you provide an example data set and the output you are hoping to achieve? – N. Williams Jun 14 '19 at 16:36
  • Hi, I have updated the post with images regarding data frame y and my output with the joined data frames. – Jace Jun 14 '19 at 16:58
  • 2
    @John: please do not post picture or screenshot of your data as it can't be used by others. Follow this guide to create a reproducible example https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Tung Jun 14 '19 at 17:16
  • @tung my apologizes, I have updated my work. – Jace Jun 14 '19 at 17:58
  • 1
    @John: unfortunately that's not really helpful as we can't see the error. – Tung Jun 14 '19 at 18:18
  • If you can't reproduce the problem here then first reduce your true data to a few rows with the problem. You say you only get NAs so it shouldn't be hard--find the first row of your true y that you think has a matched state-address pair but gives NA. Suspect type mismatches. – philipxy Jun 14 '19 at 18:31
  • 1
    @John did you get any warning messages when you joined your data? – Bryan Adams Jun 15 '19 at 19:28

0 Answers0