0

Subway data Columns:

  1. Station_id (1,2,3,4,5,6,7,8,1,2,3,4,5,1,2,3,4,5,6,7,1,2,3)
  2. Number of people boarded
  3. Number of people deboarded
  4. Occupancy

Occupancy at current station = Number of people already in the train at previous station + Number of people boarded - Number of people deboarded

I am trying to fill the occupancy column. The issue is that the dataset is for multiple subway trains so station_id changes back to 1 and on that station number of people deboarded is always 0 since it is the station from which train journey begins. I have got no clue on how to do this in postgresql. The occupancy column in the sample image below is empty and needs to be filled

The train journeys are sorted and grouped. enter image description here

Aman
  • 353
  • 1
  • 3
  • 13
  • 2
    sample data and the expected output would really be helpful. – Vamsi Prabhala Nov 30 '17 at 21:15
  • 3
    Please **[EDIT]** your question and add some [sample data](http://plaintexttools.github.io/plain-text-table/) in tabular format and the expected output based on that data. [**Formatted text**](http://stackoverflow.com/help/formatting) please, [**no screen shots**](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557). **[edit]** your question - do **not** post code or additional information in comments. –  Nov 30 '17 at 21:16
  • 2
    Do you have `train_id` you can add to your data set? Do you have timestamps? – PM 77-1 Nov 30 '17 at 21:22
  • Thanks for the suggestions. I have uploaded a sample data. There is no timestamp column but I created an id column which is a serial – Aman Nov 30 '17 at 21:40
  • SQL tables and their IDs aren't really meant to be used like this. But you should be able to do this using a "running total". Maybe this thread would help https://stackoverflow.com/questions/22841206/calculating-cumulative-sum-in-postgresql – Jacob H Nov 30 '17 at 21:43

1 Answers1

3

You can do this with the difference of the cumulative sums. The trick is identifying the groups, which I'll do by counting the number of times that station_id has been 1 up to that record.

select s.*,
       (sum(boarded) over (partition by grp order by id) -
        sum(deboarded) over (partition by grp order by id)
       ) as occupants
from (select s.*,
             count(*) filter (where station_id = 1) over (order by id) as grp
      from subwaydata s
     ) s;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786