0

I have the following "mlb_pitchers" table:

+------+-----------+-----------+-----+------+
| year | playerID  | nameLast  | IP  | WAR  |
+------+-----------+-----------+-----+------+
| 1903 | mathech01 | Mathewson | 366 | 9.6  |
| 1903 | youngcy01 | Young     | 341 | 10.3 |
| 1904 | mathech01 | Mathewson | 367 | 8.4  |
| 1904 | walshed01 | Walsh     | 110 | 1.1  |
| 1904 | youngcy01 | Young     | 380 | 8.8  |
| 1905 | mathech01 | Mathewson | 338 | 14.9 |
| 1905 | walshed01 | Walsh     | 136 | 3.0  |
| 1905 | youngcy01 | Young     | 320 | 9.9  |
| 1906 | mathech01 | Mathewson | 266 | 1.5  |
| 1906 | walshed01 | Walsh     | 278 | 7.2  |
| 1906 | youngcy01 | Young     | 287 | 0.6  |
+------+-----------+-----------+-----+------+

When doing this:

SELECT playerID, nameLast, SUM(IP), SUM(WAR)
FROM mlb_pitchers
WHERE year >= '1903' AND year <= '1906'
GROUP BY playerID, nameLast

I get this output:

+-----------+-----------+---------+----------+
| playerID  | nameLast  | SUM(IP) | SUM(WAR) |
+-----------+-----------+---------+----------+
| mathech01 | Mathewson | 1337    | 34.4     |
| walshed01 | Walsh     | 524     | 11.3     |
| youngcy01 | Young     | 1328    | 29.6     |
+-----------+-----------+---------+----------+

This works great if a pitcher pitched in any of those years (1903-1906). But I want to exclude a pitcher who did NOT have a record in first search year (1903). So the output would look like this (exclude Walsh):

+-----------+-----------+---------+----------+
| playerID  | nameLast  | SUM(IP) | SUM(WAR) |
+-----------+-----------+---------+----------+
| mathech01 | Mathewson | 1337    | 34.4     |
| youngcy01 | Young     | 1328    | 29.6     |
+-----------+-----------+---------+----------+

I wish to include only those pitchers who have a 1903 record. Or the opposite - exclude those that do not have a 1903 record. I haven't been able to figure out what to add to the SELECT query to accomplish this. Is this even possible?

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
user2140857
  • 129
  • 12

2 Answers2

2

Presumably, "year" is an integer, so don't use quotes.

If you want players that have 1903 in their records, you can use having:

SELECT playerID, nameLast, SUM(IP), SUM(WAR)
FROM mlb_pitchers
WHERE year >= 1903 AND year <= 1906
GROUP BY playerID, nameLast
HAVING SUM(CASE WHEN year = 1903 THEN 1 ELSE 0 END) > 0;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

If you want to include only a range of years you can add to the where clause but if I understand you dont want to filter years but players.

SELECT playerID, nameLast, SUM(IP), SUM(WAR)
FROM mlb_pitchers
WHERE playerID In (Select PlayerId from mlb_pitchers where year = '1903')
GROUP BY playerID, nameLast

This gives stats on all years but only includes players with a 1903 record.

Joe C
  • 3,925
  • 2
  • 11
  • 31
  • There are 47K records and this solution scanned all records, not just 1903 - 1906. So it ran forever. Sorry. – user2140857 Dec 18 '17 at 18:35
  • You can easily add the year back in. Doing it in the where clause rather than the having clause will eliminate the records before aggregation so on a larger table would have a significant performance impact. https://stackoverflow.com/questions/328636/which-sql-statement-is-faster-having-vs-where – Joe C Dec 19 '17 at 12:41
  • Per your suggestion, I added the years in the WHERE and it still takes a long time. What exactly is your SQL code suggestion. – user2140857 Dec 19 '17 at 20:30