Points within buffer with the same id

Question

I have a buffer layer and a point layer:

 buffer_gdf
     ID
0    1A
1    1B
2    1C

and

Is there a way to count how many points with ID=1A are within buffer ID=1A, how many points ID=1B are within buffer ID=1B, how many points ID=1C are within buffer ID=1C, and so on... I have more than 20000 buffers and more than 300000 points.

I'm using pandas but I can also use R.

Sorry, I didn't mention that some points are outside the buffers. I just need those within the buffers

is `point_gdf['ID'].value_counts()` what you are looking for? — rhug123, Jul 21 '21 at 19:22
@rhug123 no, because there are some points that are outside the buffers :( — ZairaRosas, Jul 21 '21 at 19:27
`buffer_gdf.join(point_gdf['ID'].value_counts(),on='ID',rsuffix = 'count'_)` should work, but I havent tested it. — rhug123, Jul 21 '21 at 20:04
Are you open to `data.table` in R? If so, I have a powerful one-liner [here](https://stackoverflow.com/a/68476119): `buffer_gdf[point_gdf, on = .(ID), nomatch = 0][, .N, by = ID]`, where the datasets are `data.table`s. — Greg, Jul 21 '21 at 20:43

akrun · Answer 1 · 2021-07-21T19:25:27.690

Here, is one way in R

sapply(buffer_gdf$ID, function(x) sum(point_gdf$ID == x))
1A 1B 1C 
 5  9  9

Or with outer

rowSums(outer(buffer_gdf$ID, point_gdf$ID, `==`))
[1] 5 9 9

If this should not consider buffer_gdf, table would be enough

table(point_gdf$ID)

Or do a subset and then get the table

with(point_gdf, table(ID[ID %in% buffer_gdf$ID]))

data


buffer_gdf <- structure(list(ID = c("1A", "1B", "1C")), class = "data.frame", row.names = c("0", 
"1", "2"))

point_gdf <- structure(list(ID = c("1A", "1A", "1A", "1A", "1A", "1B", "1B", 
"1B", "1B", "1B", "1B", "1B", "1B", "1B", "1C", "1C", "1C", "1C", 
"1C", "1C", "1C", "1C", "1C")), class = "data.frame", row.names = c("0", 
"1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", 
"13", "14", "15", "16", "17", "18", "19", "20", "21", "22"))

score 2 · Answer 2 · edited Jul 21 '21 at 20:15

Get your value counts using:

counts = point_gdf.value_counts()

Then reset_index and merge your buffer_gdf on counts to attach your sums: buffer_gdf.merge(counts.reset_index(), on='ID', how='left', validate='1:1').¹

You could write this as a one-liner:

buffer_gdf.merge(point_gdf.value_counts().reset_index(), on='ID',
                 how='left', validate='1:1')

¹ You don't need to specify validate='1:1'. I just almost always provide a validate keyword when writing merges to ensure that the data are formatted how I expect them. I think it's a best practice.

Greg · Answer 3 · 2021-07-21T21:07:54.457

Here's a solution with the performance of data.table and the elegance of INNER JOIN.

Solution

Given your sample data, reproduced here as data.tables

buffer_gdf <- structure(list(ID = c("1A", "1B", "1C")),
                        row.names = c(NA, -3L), class = c("data.table"))


point_gdf <- structure(list(ID = c("1A", "1A", "1A", "1A", "1A", "1B", "1B", "1B", "1B", "1B", "1B", "1B", "1B", "1B", "1C", "1C", "1C", "1C", "1C", "1C", "1C", "1C", "1C")),
                       row.names = c(NA, -23L), class = c("data.table"))

the following approach

library(data.table)


# ...
# Code to generate 'buffer_gdf' and 'point_gdf' as data.tables.
# ...


#         |----------- INNER JOIN -----------||--- Count ---|
buffer_gdf[point_gdf, on = .(ID), nomatch = 0][, .N, by = ID]

should yield output like this:

   ID N
1: 1A 5
2: 1B 9
3: 1C 9

Note

If your datasets are not already data.tables, use as.data.table() to convert them prior to the operations.

If you want, you can customize the header when counting: [, .(Count_Name = .N), by = ID].

score 0 · Answer 4 · answered Jul 21 '21 at 19:29

0

Why not join the tables and get the count by groups. Below is the code.

point_gdf %>% inner_join(buffer_gdf , by = "ID") %>% group_by(ID) %>%
  summarise(count = n())

answered Jul 21 '21 at 19:29

Jason Mathews

765
3
13

I like this elegant concept, but will `dplyr` stand up to the sheer volume of data? Perhaps `data.table` would provide better performance... – Greg Jul 21 '21 at 20:15
@Greg well frankly, if performance is what you're looking for, i wouldn't rely on R – ifly6 Jul 21 '21 at 20:18
1

@ifly6 Do you have any benchmarks where pandas vs collapse vs data.table? – akrun Jul 21 '21 at 20:20

score 0 · Accepted Answer · answered Aug 10 '21 at 22:23

0

Thank you all so much for your help. However, the best and useful answer for me is this post.

answered Aug 10 '21 at 22:23

ZairaRosas

27
4

Points within buffer with the same id

5 Answers5

data

Solution

Note