0

I'm new to R. So I'm pretty sure I have a simple problem, but I've unfortunately been unable to reach a solution on my own.

I have two real estate data frames that have zip code-specific observations for house listing/pricing data. These data frames represent different years of observation and different locations of observations.

The first data frame has a column for postal codes as well as a column for the town name corresponding to each postal code. Like so:

w <- c(33029, 12778, 12309, 74074, 21128, 33066)
x <- c("hollywood, fl","smallwood, ny", "schenectady, ny", "stillwater, ok", "perry hall, md", "pompano beach, fl")
m <- matrix(c(w,x), ncol = 2)
as.data.frame(m)

But the second data frame just has a postal code column (no town name column). Like so:

y <- c(12309, 33066, 74074, 11475, 12778, 12309)
as.data.frame(y)

Please note that because these data frames represent different times and locations of observations there may actually be zip codes in the second data frame that are not in the first data frame (which is why I included some unique zip codes in the above example of the second data frame). So, I don't believe it's a simple merge function.

So, I'm trying to add a column to the second data frame that returns the corresponding town name from df1 based on the observed postal code in df2. In other words, when the df2 postal code = the df1 postal code, the new column in df2 prints the corresponding town name from df1.

Please let me know if you need any additional info to crack the case. I'm new here, so I appreciate your help and patience!

kthomas
  • 11
  • 2
  • 2
    Lacking sample data, I'm going to go out on a limb and say that this is a classic "merge/join" operation, please see https://stackoverflow.com/q/1299871/3358272, https://stackoverflow.com/q/5706437/3358272. Your question is unreproducible, though, in that we have no data, no code attempted, and StackOverflow is generally about concrete programming issues. Please read https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. If the first two links do not resolve your issue, come back and [edit] your question to add the missing parts. Thanks! – r2evans Jan 18 '22 at 00:10
  • Take a look at the “merge” function or the “left_join” function. – Dave2e Jan 18 '22 at 00:27
  • @r2evans edited as per your feedback. Thanks! – kthomas Jan 18 '22 at 00:53
  • `merge(df1, df2, by.x = "V1", by.y = "y", all.x = TRUE)` – r2evans Jan 18 '22 at 00:55
  • Your data is technically incompatible, though: while `base::merge` is ignoring the fact that your zipcodes in one frame are numbers and strings in the other, most other joining functions will complain loudly (fail). If your real data is like that, I suggest you do two things: (1) convert the number-column to strings; and (2) look for zip codes where you have lost any leading 0s (perhaps using `nchar(.)`. – r2evans Jan 18 '22 at 00:57

0 Answers0