I'm new to Hive and trying to optimize a query that is taking a while to run. I have identical calls to regexp_extract and get_json in my SELECT and WHERE statements, and I was wondering if there is a way to optimize this by storing the results from one statement and using them in the other (or if Hive is already doing something like this in the background).
Example query:
SELECT
regexp_extract(get_json(json, 'url'), '.*[&?]q=([^&]*)') as query
FROM
api_request_logs
WHERE
LENGTH(regexp_extract(get_json(json, 'url'), '.*[&?]q=([^&]*)')) > 0
Thanks!