0

I have a buffer layer and a point layer:

 buffer_gdf
     ID
0    1A
1    1B
2    1C

and

 point_gdf
      ID
0     1A
1     1A
2     1A
3     1A
4     1A
5     1B
6     1B
7     1B
8     1B
9     1B
10    1B
11    1B
12    1B
13    1B
14    1C    
15    1C
16    1C
17    1C    
18    1C
19    1C
20    1C    
21    1C
22    1C

Is there a way to count how many points with ID=1A are within buffer ID=1A, how many points ID=1B are within buffer ID=1B, how many points ID=1C are within buffer ID=1C, and so on... I have more than 20000 buffers and more than 300000 points.

I'm using pandas but I can also use R.

Sorry, I didn't mention that some points are outside the buffers. I just need those within the buffers

ZairaRosas
  • 27
  • 4
  • is `point_gdf['ID'].value_counts()` what you are looking for? – rhug123 Jul 21 '21 at 19:22
  • @rhug123 no, because there are some points that are outside the buffers :( – ZairaRosas Jul 21 '21 at 19:27
  • `buffer_gdf.join(point_gdf['ID'].value_counts(),on='ID',rsuffix = 'count'_)` should work, but I havent tested it. – rhug123 Jul 21 '21 at 20:04
  • Are you open to `data.table` in R? If so, I have a powerful one-liner [here](https://stackoverflow.com/a/68476119): `buffer_gdf[point_gdf, on = .(ID), nomatch = 0][, .N, by = ID]`, where the datasets are `data.table`s. – Greg Jul 21 '21 at 20:43

5 Answers5

3

Here, is one way in R

sapply(buffer_gdf$ID, function(x) sum(point_gdf$ID == x))
1A 1B 1C 
 5  9  9 

Or with outer

rowSums(outer(buffer_gdf$ID, point_gdf$ID, `==`))
[1] 5 9 9

If this should not consider buffer_gdf, table would be enough

table(point_gdf$ID)

Or do a subset and then get the table

with(point_gdf, table(ID[ID %in% buffer_gdf$ID]))

data


buffer_gdf <- structure(list(ID = c("1A", "1B", "1C")), class = "data.frame", row.names = c("0", 
"1", "2"))

point_gdf <- structure(list(ID = c("1A", "1A", "1A", "1A", "1A", "1B", "1B", 
"1B", "1B", "1B", "1B", "1B", "1B", "1B", "1C", "1C", "1C", "1C", 
"1C", "1C", "1C", "1C", "1C")), class = "data.frame", row.names = c("0", 
"1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", 
"13", "14", "15", "16", "17", "18", "19", "20", "21", "22"))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Get your value counts using:

counts = point_gdf.value_counts()

Then reset_index and merge your buffer_gdf on counts to attach your sums: buffer_gdf.merge(counts.reset_index(), on='ID', how='left', validate='1:1').1

You could write this as a one-liner:

buffer_gdf.merge(point_gdf.value_counts().reset_index(), on='ID',
                 how='left', validate='1:1')

1 You don't need to specify validate='1:1'. I just almost always provide a validate keyword when writing merges to ensure that the data are formatted how I expect them. I think it's a best practice.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
ifly6
  • 5,003
  • 2
  • 24
  • 47
1

Here's a solution with the performance of data.table and the elegance of INNER JOIN.

Solution

Given your sample data, reproduced here as data.tables

buffer_gdf <- structure(list(ID = c("1A", "1B", "1C")),
                        row.names = c(NA, -3L), class = c("data.table"))


point_gdf <- structure(list(ID = c("1A", "1A", "1A", "1A", "1A", "1B", "1B", "1B", "1B", "1B", "1B", "1B", "1B", "1B", "1C", "1C", "1C", "1C", "1C", "1C", "1C", "1C", "1C")),
                       row.names = c(NA, -23L), class = c("data.table"))

the following approach

library(data.table)


# ...
# Code to generate 'buffer_gdf' and 'point_gdf' as data.tables.
# ...


#         |----------- INNER JOIN -----------||--- Count ---|
buffer_gdf[point_gdf, on = .(ID), nomatch = 0][, .N, by = ID]

should yield output like this:

   ID N
1: 1A 5
2: 1B 9
3: 1C 9

Note

If your datasets are not already data.tables, use as.data.table() to convert them prior to the operations.

If you want, you can customize the header when counting: [, .(Count_Name = .N), by = ID].

Greg
  • 3,054
  • 6
  • 27
0

Why not join the tables and get the count by groups. Below is the code.

point_gdf %>% inner_join(buffer_gdf , by = "ID") %>% group_by(ID) %>%
  summarise(count = n())
Jason Mathews
  • 765
  • 3
  • 13
  • I like this elegant concept, but will `dplyr` stand up to the sheer volume of data? Perhaps `data.table` would provide better performance... – Greg Jul 21 '21 at 20:15
  • @Greg well frankly, if performance is what you're looking for, i wouldn't rely on R – ifly6 Jul 21 '21 at 20:18
  • 1
    @ifly6 Do you have any benchmarks where pandas vs collapse vs data.table? – akrun Jul 21 '21 at 20:20
0

Thank you all so much for your help. However, the best and useful answer for me is this post.

ZairaRosas
  • 27
  • 4