2

I have a table, Sub_Table1_Clus with the following schema:

name    Type    Mode    
acc_id  INTEGER NULLABLE    
imp_id  STRING  NULLABLE    
con_id  STRING  NULLABLE    
in_id   STRING  NULLABLE    
date    TIMESTAMP   NULLABLE    
tr_nm   STRING  NULLABLE    
tr_ve   STRING  NULLABLE    
d_cd    TIMESTAMP   NULLABLE

The table has 140501802 rows and I am trying to get results for this query:

    SELECT imp.*,
    COALESCE(LAG(imp.date) OVER(PARTITION BY con_id, in_id, tr_nm 
    ORDER BY d_cd DESC), imp.date) AS imp_date  
FROM `Nikhil_Practice.Sub_Table1_Clus` as imp

I am getting this error

Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 118% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 98% other/unattributed: 2%

Can anyone please help me to work around some other way to mitigate this error. Thanks

-- adding more details to it to understand better--

My ultimate goal from the query is to have the entire table partitioned by con_id, in_id, tr_nm and order the individual partitions by d_cd so that I could get the date column of previously occurred row.

for example, consider the following data

    acc_id imp_id con_id in_id date tr_nm tr_ve d_cd 
    1   a   a1  b1  2018-04-26 2:28:21  c1  d1  2018-04-26 2:28:35
    1   a   a1  b1  2018-04-26 2:28:21  c1  d1  2018-04-26 2:29:04
    1   b   a1  b1  2018-04-26 2:28:21  c1  d1  2018-04-26 2:29:04
    1   b   a1  b2  2018-04-26 2:28:21  c1  d1  2018-04-26 2:28:35
    2   c   a1  b2  2018-04-26 2:30:56  c1  d1  2018-04-26 2:30:58
    2   c   a1  b2  2018-04-26 2:30:56  c1  d1  2018-04-26 2:30:58
    2   d   a1  b3  2018-04-26 10:08:38 c1  d1  2018-04-26 10:08:39
    2   d   a1  b3  2018-04-26 10:08:38 c1  d1  2018-04-26 10:08:39
    2   d   a2  b3  2018-04-26 10:52:07 c1  d1  2018-04-26 10:52:08
    3   d   a2  b3  2018-04-26 10:52:07 c2  d2  2018-04-26 10:52:08
    3   e   a2  b4  2018-04-26 21:33:29 c2  d2  2018-04-26 21:33:29
    3   e   a2  b4  2018-04-26 21:33:29 c2  d2  2018-04-26 21:33:29
    3   e   a2  b4  2018-04-26 10:14:01 c2  d2  2018-04-26 10:14:06
    3   e   a2  b5  2018-04-26 3:06:29  c2  d2  2018-04-26 3:06:29
    3   e   a2  b5  2018-04-26 3:06:29  c2  d2  2018-04-26 3:06:29
    4   e   a2  b5  2018-04-26 3:06:29  c2  d2  2018-04-26 3:06:29
    4   f   a2  b5  2018-04-26 4:46:52  c2  d2  2018-04-26 4:46:53
    4   f   a2  b5  2018-04-26 4:46:52  c2  d2  2018-04-26 4:46:53
    4   f   a2  b5  2018-04-26 4:46:52  c2  d2  2018-04-26 4:46:53
    4   f   a2  b5  2018-04-26 5:48:32  c2  d2  2018-04-26 5:49:28

Desired O/p:

acc_id imp_id con_id in_id date tr_nm tr_ve d_cd imp_date

1   a   a1  b1  2018-04-26 2:28:21  c1  d1  2018-04-26 2:29:04  2018-04-26 2:28:21
1   b   a1  b1  2018-04-26 2:28:21  c1  d1  2018-04-26 2:29:04  2018-04-26 2:28:21
1   a   a1  b1  2018-04-26 2:28:21  c1  d1  2018-04-26 2:28:35  2018-04-26 2:28:21
2   c   a1  b2  2018-04-26 2:30:56  c1  d1  2018-04-26 2:30:58  2018-04-26 2:30:56
2   c   a1  b2  2018-04-26 2:30:56  c1  d1  2018-04-26 2:30:58  2018-04-26 2:30:56
1   b   a1  b2  2018-04-26 2:28:21  c1  d1  2018-04-26 2:28:35  2018-04-26 2:30:56
2   d   a1  b3  2018-04-26 10:08:38 c1  d1  2018-04-26 10:08:39 2018-04-26 10:08:38
2   d   a1  b3  2018-04-26 10:08:38 c1  d1  2018-04-26 10:08:39 2018-04-26 10:08:38
2   d   a2  b3  2018-04-26 10:52:07 c1  d1  2018-04-26 10:52:08 2018-04-26 10:52:07
3   d   a2  b3  2018-04-26 10:52:07 c2  d2  2018-04-26 10:52:08 2018-04-26 10:52:07
3   e   a2  b4  2018-04-26 21:33:29 c2  d2  2018-04-26 21:33:29 2018-04-26 21:33:29
3   e   a2  b4  2018-04-26 21:33:29 c2  d2  2018-04-26 21:33:29 2018-04-26 21:33:29
3   e   a2  b4  2018-04-26 10:14:01 c2  d2  2018-04-26 10:14:06 2018-04-26 21:33:29
4   f   a2  b5  2018-04-26 5:48:32  c2  d2  2018-04-26 5:49:28  2018-04-26 5:48:32
4   f   a2  b5  2018-04-26 4:46:52  c2  d2  2018-04-26 4:46:53  2018-04-26 5:48:32
4   f   a2  b5  2018-04-26 4:46:52  c2  d2  2018-04-26 4:46:53  2018-04-26 4:46:52
4   f   a2  b5  2018-04-26 4:46:52  c2  d2  2018-04-26 4:46:53  2018-04-26 4:46:52
3   e   a2  b5  2018-04-26 3:06:29  c2  d2  2018-04-26 3:06:29  2018-04-26 4:46:52
3   e   a2  b5  2018-04-26 3:06:29  c2  d2  2018-04-26 3:06:29  2018-04-26 3:06:29
4   e   a2  b5  2018-04-26 3:06:29  c2  d2  2018-04-26 3:06:29  2018-04-26 3:06:29

I hope you can understand better now

Nikhil Gurram
  • 21
  • 1
  • 4
  • Possible dupe or related to https://stackoverflow.com/questions/46005418/query-failed-error-resources-exceeded-during-query-execution-the-query-could-n. In this question they recommended removing the `order by` since it's apparently an expensive operation. – Dan Jan 29 '19 at 21:07
  • Hello, I have an order by and partition by in a window function. the question you provided won't give a solution for this problem because the partition I have is too huge. – Nikhil Gurram Jan 29 '19 at 21:14
  • @NikhilGurram - usually the type of questions like simply "how to fix my query" does not fly well on SO. You should rather present your use-case / logic of what and why you do - this way you will have better chances to get proper answers – Mikhail Berlyant Jan 30 '19 at 06:12
  • We would like to help you. Can you tell us what is the goal if this query? It would awesome if you could provide a sample dataset where we could repeat the error, once we know what's the ultimate goal. – Felipe Hoffa Jan 30 '19 at 11:09
  • @MikhailBerlyant I think you can get a clear idea now.. since I added use case and sample data. please let me know if you have any potential solution. – Nikhil Gurram Jan 30 '19 at 16:23
  • @FelipeHoffa I think you can get a clear idea now.. since I added use case and sample data. please let me know if you have any potential solution. – Nikhil Gurram Jan 30 '19 at 16:23

1 Answers1

2

Usually partition by helps with the resources issue. How large is your biggest grouping?

select con_id, in_id, tr_nm, count(*)
from `Nikhil_Practice.Sub_Table1_Clus` imp
group by on_id, in_id, tr_nm
order by count(*) desc;

If you have millions and millions of rows in the largest partition, then this might be contributing to the problem.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you for reaching out @gordon, I have 49723460 rows in the largest partition do you have any alternate solution for my problem. Thanks – Nikhil Gurram Jan 29 '19 at 21:04