0

Earlier I asked about manipulating a data structure in Hive or Pig. I was able to get an answer in SQL, and from there figured out the answer for Hive. I'm still looking for a solution in Pig.

I want to change myTable: enter image description here

Into myTable2: enter image description here

I tried:

myTable2 = FOREACH myTable GENERATE item, year, 
'jan' AS month, jan AS value, 
'feb' AS month, feb AS value,  
'mar' AS month, mar AS value;

Which more or less is what worked in Hive, but Pig gives me:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1108:
<line 2, column 35> Duplicate schema alias: month
wugology
  • 193
  • 1
  • 4
  • 13
  • 2
    possible duplicate of [Pivot table with Apache Pig](http://stackoverflow.com/questions/11213567/pivot-table-with-apache-pig) – o-90 Aug 10 '15 at 22:48
  • have you tried renaming the 3 instances of month to say month1, month2, month3? – savx2 Jun 09 '16 at 16:30

2 Answers2

0

I figured it out, although I would love to see a more concise version:

JAN = FOREACH myTable GENERATE item, year, 'jan' AS month, jan AS value;
FEB = FOREACH myTable GENERATE item, year, 'feb' AS month, feb AS value;
MAR = FOREACH myTable GENERATE item, year, 'mar' AS month, mar AS value;
myTable2 = union JAN, FEB, MAR;
wugology
  • 193
  • 1
  • 4
  • 13
  • Unions are too expensive to use. It takes a lot of resources. Also, this approach don't guarantee the order in which you are expecting. For example, you might get the row with 'feb' month details ahead of the 'jan' month details. It totally depends on which mapper writes the data first. – Sai Neelakantam Aug 11 '15 at 05:00
  • 1
    You can do something like this to maintain the order. `JAN = FOREACH myTable GENERATE item, year, 'jan' AS month, jan AS value, 1 as index;` `FEB = FOREACH myTable GENERATE item, year, 'feb' AS month, feb AS value, 2 as index;`. After this you can do the union operation. After this you need to group it and order them based on the group. This i am not sure how we can achieve it. – Sai Neelakantam Aug 11 '15 at 05:04
0

Pig Script:

data = LOAD '/pigsamples/sampledata'  USING PigStorage(',') 
       AS (item:CHARARRAY, year:INT, jan:DOUBLE, feb:DOUBLE, mar:DOUBLE);

--concatenating month name to its value so that they won't get separated when i perform a flatten on the tuple.
concat_data =  FOREACH data GENERATE item, year, CONCAT('jan:', (CHARARRAY)jan) AS jan, 
               CONCAT('feb:', (CHARARRAY)feb) AS feb, CONCAT('mar:', (CHARARRAY)mar) AS mar;

--convert the month (name,value) pairs to a bag and flatten them
flatten_values = FOREACH concat_data GENERATE item, year, FLATTEN (TOBAG (jan, feb, mar)) AS month_values;

--split the string based on the delimiter that we used above to concat
split_flatten_values = FOREACH flatten_values GENERATE item, year, FLATTEN (STRSPLIT (month_values, ':')) AS (month:CHARARRAY, value:CHARARRAY);
Sai Neelakantam
  • 919
  • 8
  • 15