0

I have several dataframes that I'm using to create maps. These dataframes are updated daily but only some of the rows are updated. So each new dataframe only contains a subset of rows of the original dataframe. The first df contains the spatial info

head(boxes)
  Name       lon      lat
1   B1 -1.308810 51.76481
2  B10 -1.309306 51.76457
3 B100 -1.308591 51.76488
4 B101 -1.308725 51.76464
5 B102 -1.308454 51.76439
6 B103 -1.308270 51.76412

The next df contains some extra info

head(boxes0604)
  Name Section       lon      lat State.code Clutch.size
1   B1       b -1.308810 51.76481          0           0    
2  B10       b -1.309306 51.76457          1           0    
3 B100       b -1.308591 51.76488          0           0    
4 B101       b -1.308725 51.76464          0           0    
5 B102       b -1.308454 51.76439          0           0    
6 B103       b -1.308270 51.76412          0           0    

and I can combine these easily enough using dplyr and left_join and plot my first map

Now let's say that I get some new info only regarding a subset of the info in my next dataframe boxes0804

head(boxes0804)
  Name Section         State.code Clutch.size
1 B108       b            0           4    
2 B211       b            1           6    
3 B219       b            4           12    
4 B237       b            4           8    
5 B287       b            4           7    
6 B291       b            4           11

You can see that this df does not contain the spatial info and only some of the rows contained in boxes0604

What I would like to do, in order to plot this new info on top of the old info from boxes0604 is combine these data sets to import only the new information and keep all of the old information. When I try to join them I lose all the information from the new df. Perhaps I'm using the join functions incorrectly.

thanks in advance

LuLuGaGa
  • 13,089
  • 6
  • 49
  • 57
McMahok
  • 348
  • 2
  • 13
  • you tried other join methods ? like `full_join` from `dplyr` – Samet Sökel Apr 20 '21 at 13:53
  • Tried them all, just can't seem to get the result I want. It usually ends up just keeping all the old info without adding the new – McMahok Apr 20 '21 at 13:55
  • I suggest you to use `merged <- cbind(old,new)` if you are sure about the order of data frames (and row numbers,of course), if you are not set an order then `cbind` them all – Samet Sökel Apr 20 '21 at 14:19

1 Answers1

0

If I understand the question correctly, you'd like to merge boxes0804 into your existing data, but only for rows where the Name appears both in the existing data and boxes0804. My suggestion would be to merge only such rows, as follows, using data.table:

boxes <- setDT(boxes)
boxes0804 <- setDT(boxes0804)
# First keep the rows without a match
boxes_nomatch <- boxes[! Name %in% boxes0804$Name]
# Now merge on the new rows, where there is a match
boxes_match <- merge(boxes[Name %in% boxes0804$Name], boxes0804, by = "Name", all.x = TRUE)
boxes_updated <- rbind(boxes_nomatch, boxes_match)

It is a little hard to do this without a reproducible example: How to make a great R reproducible example, but I think this is what you're going for.

Gabe Solomon
  • 365
  • 3
  • 12