4

I have two data frames:

geodata<-read.csv(text='postalcode;latitude;longitude
7000;47.8415;16.5041
7000;47.8921;16.4011
7000;47.9400;16.5032
7071;47.8012;16.671
2443;47.9156;16.5179', sep=';', header=TRUE)

addresses<-read.csv(text='postalcode;address
2400;Lorem ipsum
7000;Dolor sit amet
2443;Consetetur sadipscing elitr
7000;Sed diam nonumy', sep=';', header=TRUE)

What I would need is two additional columns in the addresses dataframe with the corresponding (first match when multiple postalcodes rows exist) geodata$latitude and geodata$longitude values.

How to join (merge) data frames (inner, outer, left, right)? shows nearly what I'm looking for except that I don't want duplicated lines in the merged dataframe.

I tried something like

# code not working... #
addresses$latitude<-geodata[addresses$postalcode==geodata$postalcode]

which I know is really ugly :-(

Community
  • 1
  • 1
Johann Horvat
  • 1,285
  • 1
  • 14
  • 18

3 Answers3

6

NO NEED to use extra packages here . Simply merge:

merge(geodata,addresses,all.x=TRUE)

#   postalcode latitude longitude                     address
# 1       2443  47.9156   16.5179 Consetetur sadipscing elitr
# 2       7000  47.8415   16.5041              Dolor sit amet
# 3       7000  47.8415   16.5041             Sed diam nonumy
# 4       7071  47.8012   16.6710                        <NA>

EDIT:

to add geodata to address we should invert the order here :

merge(addresses,geodata,all.x=TRUE)
  postalcode                     address latitude longitude
1       2400                 Lorem ipsum       NA        NA
2       2443 Consetetur sadipscing elitr  47.9156   16.5179
3       7000              Dolor sit amet  47.8415   16.5041
4       7000             Sed diam nonumy  47.8415   16.5041
agstudy
  • 119,832
  • 17
  • 199
  • 261
3

Perhaps with a left_join from dplyr:

library(dplyr)
left_join(addresses,geodata, fill=NA)
#Joining by: "postalcode"
#  postalcode                     address latitude longitude
#1       2400                 Lorem ipsum       NA        NA
#2       7000              Dolor sit amet  47.8415   16.5041
#3       2443 Consetetur sadipscing elitr  47.9156   16.5179
#4       7000             Sed diam nonumy  47.8415   16.5041

Edit, if there is more than one result per postal code, you could try subsetting geodata:

#left_join(addresses, subset(geodata, !duplicated(postalcode)), fill=NA)
#Joining by: "postalcode"
#  postalcode                     address latitude longitude
#1       2400                 Lorem ipsum       NA        NA
#2       7000              Dolor sit amet  47.8415   16.5041
#3       2443 Consetetur sadipscing elitr  47.9156   16.5179
#4       7000             Sed diam nonumy  47.8415   16.5041
erasmortg
  • 3,246
  • 1
  • 17
  • 34
  • Thanks, this works nearly as expected. I edited my post: the problem is, that there may be multiple lat/loc values per postal code. What I would need is the first matching line within the geodata datafame. – Johann Horvat Aug 27 '15 at 09:54
  • @JohannHorvat edited the answer, let me know if it works – erasmortg Aug 27 '15 at 10:06
2

You could join them (in plyr)

library(plyr)
addresses <- join(addresses,geodata)
Wannes Rosiers
  • 1,680
  • 1
  • 12
  • 18