I have a PostgreSQL 9.4.1 database (Retrosheet data) with a table events
containing one row per baseball play. I want to a compute a running batting average for a given player: the formula is (total number of hits so far)/(total number of valid at-bats so far).
I can use window functions to get a running total of hits for David Ortiz, whose player code is ortid001
, using the following query:
SELECT count(*) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM events WHERE bat_id='ortid001' AND (event_cd='20' OR event_cd='21'
OR event_cd='22' OR event_cd='23');
(The clause involving event_cd
just identifies which rows are considered hits.)
Using the same technique, I can get a running total of at-bats (the event_cd
clause rejects every row that doesn't count as an at-bat. note that the hits selected above are a subset of at-bats):
SELECT count(*) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM events WHERE bat_id='ortid001' AND (event_cd != '11' AND
event_cd!='14' AND event_cd!='15' AND event_cd!='16' AND
event_cd!='17');
How can I combine these? Ideally, for each row describing a play with bat_id='some_player_id'
, I would compute two functions: the count of all preceding rows describing an at-bat, and the count of all preceding rows describing hits. Dividing these gives the running batting average at that row.