1

I'm using pig to analyze data loaded from Cassandra. One of the columns that I get is a string with product ids and product information in JSON format:

row | ... |                              items                                       | ...  
 1  | ... | "[{"id":"1", "useless_info":"blah"}, {"id":"2", "useless_info":"bleh"}]" | ...          
 2  | ... | "[{"id":"3"}]"                                                           | ...  
 .  |  .  |                                   .                                      |  .    

Note that some of the rows will have additional stuff within the string, while others will only have id.

Anyways, what I need to do is to parse each "items" string and generate id numbers:

row | id | ... |  
 1  | 1  | ... |  
 1  | 2  | ... |  
 2  | 3  | ... |  
etc

From what I understand, there are no JSON parsers for Pig out there, only load and store functions (like elephantbird). Is it possible to do what I want with something like REGEX_EXTRACT or will I have to write my own UDF (or is there a better, prettier, and more clever way)?

Thanks in advance for all your help!

PS I'm using Pig 0.93

outis
  • 75,655
  • 22
  • 151
  • 221
hriundel
  • 23
  • 3
  • Possible dup of [How do I parse JSON in Pig?](http://stackoverflow.com/q/5013003/90527) – outis Mar 29 '12 at 03:10
  • Yeah, I looked at that blog post - they are using [elephantbird](http://eric.lubow.org/2011/hadoop/pig-queries-parsing-json-on-amazons-elastic-map-reduce-using-s3-data/) to _load_ the JSON-formatted data. My data is actually all strings (and already loaded into Pig), and only one part has JSON-like format (that needs to be parsed) – hriundel Mar 29 '12 at 03:33

1 Answers1

1

Elephant Bird has JsonStringToMap, which parses a JSON String and outputs a Map in Pig. This is distinct from their JsonLoader, which parses JSON while loading a file.

msponer
  • 111
  • 2
  • Thanks for pointing this out. Although we needed to something different, it was relatively straightforward to write a UDF based on JsonStringToMap, despite just basic knowledge of java. – hriundel Apr 05 '12 at 18:05