I am trying to separate text using a semi-colon delimiter. I have a source text file that contains a lot of SQL queries. For example:
SELECT
employee.id,
employee.first_name,
employee.last_name,
SUM(DATEDIFF("SECOND", call.start_time, call.end_time)) AS call_duration_sum
FROM call
INNER JOIN employee ON call.employee_id = employee.id
GROUP BY
employee.id,
employee.first_name,
employee.last_name
ORDER BY
employee.id ASC;
------------------------
SELECT
single_employee.id,
single_employee.first_name,
single_employee.last_name,
single_employee.call_duration_avg,
single_employee.call_duration_avg - avg_all.call_duration_avg AS avg_difference
FROM
(
SELECT
1 AS join_id,
employee.id,
employee.first_name,
employee.last_name,
AVG(DATEDIFF("SECOND", call.start_time, call.end_time)) AS call_duration_avg
FROM call
INNER JOIN employee ON call.employee_id = employee.id
GROUP BY
employee.id,
employee.first_name,
employee.last_name
) single_employee
INNER JOIN
(
SELECT
1 AS join_id,
AVG(DATEDIFF("SECOND", call.start_time, call.end_time)) AS call_duration_avg
FROM call
) avg_all ON avg_all.join_id = single_employee.join_id;
-----------------------------
SELECT *
FROM call
ORDER BY
call.employee_id ASC,
call.start_time ASC;
These 3 sample queries must be split into 3 rows of a dataframe - each containing a query.
As you can see there are complex queries in this text and I see that the problem with regex is the \n character.
I see that there is no direct way to do this and the original file contains around 2000 SQL queries.
I would like to extract each query as a single entity and place it in a separate row.