0

I have a HIVE table in which data is loaded for tripid as tripid, gps_location_1, gps_location_2 locations, it could be 10 gps locations for one trip and for other trip there could be 500 gps locations, when I query data my results are as follows

select tripid, size(gps) as count from trip;

tripid gps_location_1 |gps_location_2 | count |

1451f2b3d |44.1145 | 44.1148| 9 |

select tripid, gps.gps_location_1, gps.gps_location_1 as count from trip;

+---------+-----------------------------------------------------------+----------------------------------------------------------+

tripid | gps_location_1 | gps_location_2 | count |

+---------+-------------------------------------------------------------+----------------------------------------------------------+ 1451f2b3d |[44.1145,44.1146,44.1147,44.1148,44.1148,44.1129,44.1127,44.1121] | [44.1148,44.1146,44.1146,44.1141,44.1138,44.1129,44.1127]| 9 | +---------+------------------------------------------------------------------+----------------------------------------------------------+

I can see first value from trip array table.

select tripid, gps[0].gps_location_1, gps[0].gps_location_1 from trip;

tripid gps_location_1 gps_location_2

1451f2b3d 44.1145 44.1148

2nd row from trip array table

select tripid, gps[1].gps_location_1, gps[1].gps_location_1 from trip;

tripid gps_location_1 gps_location_2

1451f2b3d 44.1146 44.1146

Last row from trip array table

select tripid, gps[size(gps)].gps_location_1, gps[size(gps)].gps_location_1 from trip;

1451f2b3d 44.1121 44.1127

I need to store each row in my new target_trip table like this, loop through all rows for one tripid in trip table and inserted into target_table showing below.

How can I achieve it?

tripid gps_location_1 gps_location_2

1451f2b3d 44.1145 44.1148

1451f2b3d 44.1146 44.1146

1451f2b3d 44.1147 44.1146

1451f2b3d 44.1148 44.1141

1451f2b3d 44.1129 44.1138

1451f2b3d 44.1127 44.1129

1451f2b3d 44.1121 44.1127

S M
  • 101
  • 3
  • 16

1 Answers1

0

Use lateral view explode:

elect tripid, coordinates.gps_location_1, coordinates.gps_location_1 
  from trip
       lateral view outer explode(gps) s as coordinates

explode() UDTF generates rows for each array element. lateral view applies UDTF to each row of base table and then joins resulting rows to the input rows to form a virtual table having the specified table alias.

See also this answer for more info about lateral view.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • thanks, yes I am testing with lateral view explode, now I am getting GC error ..Error: java.lang.OutOfMemoryError: GC overhead limit exceeded (state=,code=0) – S M Oct 04 '18 at 14:30
  • @SM Try to increase parallelism on reducers. Read this: https://stackoverflow.com/a/42842117/2700344 Read this https://stackoverflow.com/a/38475807/2700344 and this: https://stackoverflow.com/a/48296562/2700344 – leftjoin Oct 04 '18 at 15:34