0

I am working with a housing data set and I am trying to see if houses that overlap both counties that are next to each other were recorded in each other's sale when the house(s) were sold.

Here is a sample of my data:

Alameda County

             date         county          city   zip   price
1  2003-04-27 Alameda County    Pleasanton 94588  565000
2  2003-04-27 Alameda County       Oakland 94618  387500
3  2003-04-27 Alameda County        Dublin 94568  450000
4  2003-04-27 Alameda County        Newark 94560  470000
5  2003-04-27 Alameda County     Livermore 94550 1120000
6  2003-04-27 Alameda County       Alameda 94501  526000
7  2003-04-27 Alameda County       Fremont 94538  325000
8  2003-04-27 Alameda County     Livermore 94550  930500
9  2003-04-27 Alameda County       Hayward 94542  525000
10 2003-04-27 Alameda County Castro Valley 94546  610000

Contra Costa County

         date              county         city   zip  price
1  2003-04-27 Contra Costa County  El Sobrante 94803 325000
2  2003-04-27 Contra Costa County      Concord 94519 347000
3  2003-04-27 Contra Costa County      Concord 94521 366000
4  2003-04-27 Contra Costa County Walnut Creek 94598 495000
5  2003-04-27 Contra Costa county      Concord 94519 370000
6  2003-04-27 Contra Costa County      Concord 94520 219000
7  2003-04-27 Contra Costa County      Antioch 94531 387000
8  2003-04-27 Contra Costa county      Clayton 94517 522000
9  2003-04-27 Contra Costa County      Antioch 94531 406500
10 2003-04-27 Contra Costa County      Antioch 94509 345000

I was thinking of using dplyr and the filter verb but I think that would require a large logical expression. How can I check if the two data frames have the same city or zip code?

  • 2
    `inner_join` on zip code will return only the records that are present in both tables with the same zip code . – phiver May 05 '18 at 09:04
  • Wow! Thanks! Did not know this was apart of the dplyr package, that's so cool! I tried `check_df <- mutate(inner_join(alameda_county_df$zip, contra_costa_county_df$zip)) %>% head(5)` but I get this as an error: `Error in UseMethod("inner_join") : no applicable method for 'inner_join' applied to an object of class "c('integer', 'numeric')"` Any hints? –  May 05 '18 at 09:40
  • `inner_join(alameda_county, contra_costa_county_df, by = zip) %>% head(5)` might work better – phiver May 05 '18 at 09:45
  • Do you know what it means by this? `Error: by must be a (named) character vector, list, or NULL for natural joins (not recommended in production code), not closure` EDIT: I think I needed to use == instead of =! but now it says `comparison (1) is possible only for atomic and list types` I'm guessing the data frame must only have one column in it which is what I'll try. –  May 05 '18 at 10:52
  • Hi, welcome to StackOverflow! I suggest you to include a `dput(yourdata)` so that everyone can quickly reproduce your data, with some small snippets of code. – Scipione Sarlo May 05 '18 at 11:06
  • Here you can find some nice options: https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right You should always search for questions that already have been answered before asking a new one, this way you may have your answer way faster! :) – Giovana Stein May 05 '18 at 11:53

1 Answers1

0

Another easy solution would be merge(). For example:

df1 <- data.frame(
    city = LETTERS[1:5],
    some_random_var1 <- sample(1:500, 5)
    )
df2 <- data.frame(
    city = sample(LETTERS, 5)
    some_random_var2 <- sample(50:1000, 5)
    )
merge(df1, df2)
  city some_random_var1 some_random_var2
  1    B              151               82
  2    D              321              668

The dataframes are by default merged on the columns with the same column name ('city' in this case). If the columns have different names, these columns can be specified by the arguments by.x and by.y.

Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34