2

I have two tables one is schools and one is students.I want to find all the students of a particular school. The schema of schools is: id, name, location and of students is :id, name, schoolId. I wrote the following script:

schoolId=$(hive -e "set hive.cli.print.header=false;select id from school;")
 hive -hiveconf "schoolId"="$schoolId" 

hive>select id,name from student where schoolId like  '${hiveconf:schoolId}%'

I dont get any result as schoolId stores all the id together.For example there are 3 schools with id: 123, 256,346 schoolId variable stores as 123 256 346 and the result is null.

1 Answers1

1

Use collect_set() with concat_ws to get comma delimited string, IDs should be cast to string:

schoolId=$(hive -e "set hive.cli.print.header=false;select concat_ws('\\',\\'',collect_set(cast(id as string))) from school;");

hive -hiveconf "schoolId"="$schoolId" 

Then use IN operator:

select id,name from student where schoolId in ('${hiveconf:schoolId}');
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • I got error like : cannot recognize input near ''33992,3d1e7,68a3e,1ed4b,8aaa7,c2eb2'' '' '' in expression specification – Vijaya Seetharaman Nov 13 '18 at 12:22
  • @VijayaSeetharaman Fixed. IDs are strings, need to single-quote them in concat_ws and also quote variable in the Hive query. Also no need to cast to string – leftjoin Nov 13 '18 at 12:40
  • Thank you. The above procedure worked. But when I do the select query `select concat_ws('\\',\\'',collect_set(cast(id as string))) from school;` individually in hive i get an error saying `FAILED: ParseException line 1:22 character '\' not supported here`. How do I get it right? – Vijaya Seetharaman Nov 14 '18 at 08:26
  • @VijayaSeetharaman In what tool are you running it? It is something with shielding `,` - maybe you need one more slash, or single slash to shield comma – leftjoin Nov 14 '18 at 08:34
  • `set hiveconf:schoolId=select concat_ws('\\',\\'',collect_set(cast(id as string))) from school; select id,name from patient where schoolId in ('${hiveconf:schoolId}');` I tried this command within hive but again its giving me the same parse exception. – Vijaya Seetharaman Nov 14 '18 at 08:36
  • No, You cannot do it like this in Hive. Hive does not calculate variables, you can assign some string value to a variable in Hive, not query result, that is why you heed shell – leftjoin Nov 14 '18 at 08:40
  • @VijayaSeetharaman In Hive you cannot put query result in the variable. – leftjoin Nov 14 '18 at 08:41
  • @VijayaSeetharaman Check also this answer: https://stackoverflow.com/a/37821218/2700344 – leftjoin Nov 14 '18 at 08:44
  • Actually I have to automate my work flow. I need to set the schoolId having values resulting from the query. I was thinking to put these into a file.hql and run the script.Can u suggest me how can i do it? – Vijaya Seetharaman Nov 14 '18 at 08:47
  • @VijayaSeetharaman Like in this answer: https://stackoverflow.com/a/53282735/2700344 Or you can use IN (select ID from school) or join w dataset – leftjoin Nov 14 '18 at 08:52
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/183612/discussion-between-vijaya-seetharaman-and-leftjoin). – Vijaya Seetharaman Nov 14 '18 at 08:55