I am currently using the European soccer SQLite database for my data analysis in Google collab (Jupyter notebook).
Aim of analysis; For a specific team ex: Chelsea, get the wins and loss label for every match (Done using CASE statement) and after this partition the match count by its season and win_loss result. This is all done within pd.read_sql() statement in google collab(Jupyter notebook).
The statement runs all fine until the window function is introduced. But the query runs all fine in the SQLite DB browser (image attached). The main error i get is OperationalError: near "(": syntax error
Here is the code
Home_Perf = pd.read_sql(""" --- CTE to get the wins and loss as a home team
WITH Homes AS (
SELECT season, team_long_name AS HomeTeam,
home_team_goal, away_team_goal,
CASE
WHEN home_team_goal > away_team_goal THEN 'win'
WHEN home_team_goal < away_team_goal THEN 'loss'
ELSE 'Tie' END AS Win_Loss
FROM match
---Inner JOIN for getting the team name
INNER JOIN team
ON team_api_id = home_team_api_id
WHERE home_team_api_id = 8455)
SELECT season, HomeTeam,
COUNT(Win_Loss) OVER(PARTITION BY season) AS counts
FROM homes""", conn)
Home_Perf
Here is the error
ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 38))
---------------------------------------------------------------------------
OperationalError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in execute(self, *args, **kwargs)
1585 try:
-> 1586 cur.execute(*args, **kwargs)
1587 return cur
OperationalError: near "(": syntax error
The above exception was the direct cause of the following exception:
DatabaseError Traceback (most recent call last)
3 frames
<ipython-input-17-9b1c924dbbdd> in <module>()
15 SELECT season, HomeTeam,
16 COUNT(Win_Loss) OVER(PARTITION BY season) AS counts
---> 17 FROM homes""", conn)
18 Home_Perf
/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize)
410 coerce_float=coerce_float,
411 parse_dates=parse_dates,
--> 412 chunksize=chunksize,
413 )
414
/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in read_query(self, sql, index_col, coerce_float, params, parse_dates, chunksize)
1631
1632 args = _convert_params(sql, params)
-> 1633 cursor = self.execute(*args)
1634 columns = [col_desc[0] for col_desc in cursor.description]
1635
/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in execute(self, *args, **kwargs)
1596
1597 ex = DatabaseError(f"Execution failed on sql '{args[0]}': {exc}")
-> 1598 raise ex from exc
1599
1600 @staticmethod
DatabaseError: Execution failed on sql ' --- CTE to get the wins and loss as a home team
WITH Homes AS (
SELECT season, team_long_name AS HomeTeam,
home_team_goal, away_team_goal,
CASE
WHEN home_team_goal > away_team_goal THEN 'win'
WHEN home_team_goal < away_team_goal THEN 'loss'
ELSE 'Tie' END AS Win_Loss
FROM match
---Inner JOIN for getting the team name
INNER JOIN team
ON team_api_id = home_team_api_id
WHERE home_team_api_id = 8455)
SELECT season, HomeTeam,
COUNT(Win_Loss) OVER(PARTITION BY season) AS counts
FROM homes': near "(": syntax error