0

Postgres 9.3.5, PostGIS 2.1.4.

I have tow tables (polygons and points) in a database.

I want to find out how many points are in each polygon. There will be 0 points per polygon or more than 200000. The little hick up is the following.

My point table looks the following:

x    y    lan
10  11    en
10  11    fr
10  11    en
10  11    es
10  11    en
- #just for demonstration/clarification purposes
13  14    fr
13  14    fr
13  14    es
-
15  16    ar
15  16    ar
15  16    ps

I do not simply want to count the number of points per polygon. I want to know what is the most often occuring lan in each polygon. So, assuming each - indicates that the points are falling into a new polygon, my results would look the following:

Polygon table:

polygon    Count   lan
1          3       en
2          2       fr
3          2       ar

This is what I got so far.

SELECT count(*), count.language AS language, hexagons.gid AS hexagonsWhere 
  FROM hexagonslan AS hexagons, 
       points_count AS france 
 WHERE ST_Within(count.geom, hexagons.geom) 
 GROUP BY language, hexagonsWhere 
 ORDER BY hexagons DESC;

It gives me the following:

Polygon    Count     language
1          3         en
1          1         fr
1          1         es
2          2         fr
2          1         es
3          2         ar
3          1         ps

Two things remain unclear.

  1. How to get only the max value?
  2. How will cases be treated where there are by any chance the max values identical?
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
four-eyes
  • 10,740
  • 29
  • 111
  • 220
  • The second question should be one answered by you, how do you want those to be treated? – Jakub Kania May 29 '15 at 14:49
  • possible duplicate of [How to select id with max date group by category in PostgreSQL?](http://stackoverflow.com/questions/16914098/how-to-select-id-with-max-date-group-by-category-in-postgresql) – Jakub Kania May 29 '15 at 14:49
  • @JakubKania I wonder wether it raises an error or gives me note that that happened or something. Thats all. – four-eyes May 29 '15 at 14:52
  • It's a classic case of groupwise max (once you throw away the performance of that `ST_Within` and the insert) so you should decide if you want an error or not. – Jakub Kania May 29 '15 at 15:01
  • @JakubKania what do you mean with "once you throw away the performance of that ST_Within and the insert"? – four-eyes May 29 '15 at 17:39
  • Do you need the total count of points or just the count of the most common type of points or both? – Erwin Brandstetter May 29 '15 at 20:13
  • @ErwinBrandstetter The count of the most common. I got the total count already. I think I create a new table with the statement I wrote above and use a `DISTINCT` than. That might be the easiest. – four-eyes May 29 '15 at 23:00
  • BTW, your query and displayed results don't add up. – Erwin Brandstetter May 30 '15 at 00:03
  • @ErwinBrandstetter when I run it it adds up the results http://www.directupload.net/file/d/4003/je7ax32a_png.htm Or what do you mean? – four-eyes May 30 '15 at 09:24
  • @Stoffer: "does not add up" is a phrase meaning it does not make sense together. You write: `It gives me the following:`, but that's not so. – Erwin Brandstetter May 30 '15 at 15:51
  • Ah ok. I understand. Have a look at the screenshot I posted. Thats what my querry prompts me. – four-eyes May 31 '15 at 09:58

1 Answers1

1

Answer to 1.

To get the most common language and its count per Polygon, you could use a simple DISTINCT ON query:

SELECT DISTINCT ON (h.gid)
       h.gid AS polygon, count(c.geom) AS ct, c.language
FROM   hexagonslan h
LEFT   JOIN points_count c ON ST_Within(c.geom, h.geom)
GROUP  BY h.gid, c.language
ORDER  BY h.gid, count(c.geom) DESC, c.language;  -- language name is tiebreaker

But for the data distribution you describe (up to 200.000 points per polygon), this should be substantially faster (hoping to make better use of an index on c.geom):

SELECT h.gid AS polygon, c.ct, c.language
FROM   hexagonslan h
LEFT   JOIN LATERAL (
   SELECT c.language, count(*) AS ct
   FROM   points_count c
   WHERE  ST_Within(c.geom, h.geom) 
   GROUP  BY 1
   ORDER  BY 2 DESC, 1  -- again, language name is tiebreaker
   LIMIT  1
   ) c ON true
ORDER  BY 1;

LEFT JOIN LATERAL .. ON true preserves polygons not containing any points.

In cases where there are by any chance the max values identical, the alphabetically first language is picked in the example, by way of the added ORDER BY item. If you want all languages that happen to share the maximum count, you have to do more:

Answer to 2.

SELECT h.gid AS polygon, c.ct, c.language
FROM   hexagonslan h
LEFT   JOIN LATERAL (
   SELECT c.language, count(*) AS ct
        , rank() OVER (ORDER BY count(*) DESC) AS rnk
   FROM   points_count c
   WHERE  ST_Within(c.geom, h.geom) 
   GROUP  BY 1
   ) c ON c.rnk = 1
ORDER  BY 1, 3  -- language only as additional sort critieria

Using the window function rank() here, (not row_number()!). We can get the count or points and the ranking of the count in a single SELECT. Consider the sequence of events:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228