-1

Below is my attempt to see pitchers in the top 98th percentile of innings pitched. I am trying to filter the NTILE() window function but it's throwing an error.

 query = """ 

WITH pitching_cte AS 
(
SELECT
player,
player_id,
season,
CAST(strftime('%Y', birth_date) AS TEXT) AS birth_year,
COUNT(*) AS num_seasons, 
printf("%i.%i", outs/3, outs % 3) AS innings_pitched
FROM pitching
INNER JOIN player
USING(player_id)
GROUP BY player
)

SELECT innings_pitched, 
   birth_year, 
   player, 
   NTILE(100) OVER(PARTITION BY innings_pitched) pctl,
   num_seasons
FROM pitching_cte
WHERE season = 2021 AND pctl >= 98;

"""
df = pd.read_sql(query, cnxn)
print(df.head(10))

When I execute this query I get the following error:

DatabaseError: Execution failed on sql ' 

WITH pitching_cte AS 
(
 SELECT
player,
player_id,
season,
CAST(strftime('%Y', birth_date) AS TEXT) AS birth_year,
COUNT(*) AS num_seasons, 
printf("%i.%i", outs/3, outs % 3) AS innings_pitched
FROM pitching
INNER JOIN player
USING(player_id)
GROUP BY player)

 SELECT innings_pitched, 
   birth_year, 
   player, 
   NTILE(100) OVER(PARTITION BY innings_pitched) pctl,
   num_seasons
FROM pitching_cte
WHERE season = 2021 AND pctl >= 98;

': misuse of aliased window function pctl

How do I filter the resulting percentile column?

Thanks

1 Answers1

0

In general it is not possible to use alias defined in SELECT list at WHERE clause.

Another limitation is that windowed function cannot be used at WHERE clause so the below would fail as well:

SELECT innings_pitched, 
   birth_year, 
   player, 
   NTILE(100) OVER(PARTITION BY innings_pitched) pctl,
   num_seasons
FROM pitching_cte
WHERE season = 2021 AND (NTILE(100) OVER(PARTITION BY innings_pitched)) >= 98;

The solution is to use another level of cte:

WITH pitching_cte AS (
  SELECT
    player,
    player_id,
    season,
    CAST(strftime('%Y', birth_date) AS TEXT) AS birth_year,
    COUNT(*) AS num_seasons, 
    printf("%i.%i", outs/3, outs % 3) AS innings_pitched
  FROM pitching
  INNER JOIN player
  USING(player_id)
  GROUP BY player
), cte AS (
  SELECT innings_pitched, 
    birth_year, 
    player, 
    NTILE(100) OVER(PARTITION BY innings_pitched) pctl,
    num_seasons
  FROM pitching_cte
  WHERE season = 2021
)
SELECT *
FROM cte
WHERE pctl >= 98;

Related: Why no windowed functions in where clauses?

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275