Questions tagged [elephantbird]

Elephant-bird is Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

Elephant-bird is Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

it contains among other things an implementation of JSON import for Pig and many other implementation of Twitter developers for their services.

More information on this project can be found on: https://github.com/twitter/elephant-bird

53 questions
7
votes
2 answers

JSON object spans multiple lines, How to split input in Hadoop

I need to ingest large JSON files whose records may span multiple lines (not files) (depends entirely on how the data provider is writing it). Elephant-Bird assumes LZO compression, which I know the data provider will not be doing. The Dzone article…
Maz
  • 91
  • 1
  • 5
5
votes
0 answers

Write data that can be read by ProtobufPigLoader from Elephant Bird

For a project of mine, I want to analyse around 2 TB of Protobuf objects. I want to consume these objects in a Pig Script via the "elephant bird" library. However it is not totally clear to my how to write a file to HDFS so that it can be consumed…
dmeister
  • 34,704
  • 19
  • 73
  • 95
5
votes
1 answer

Elephant-bird mvn package error

I have installed hadoop 2.2 in my system. I want to use Elephant-Bird jar. Am getting following error while runnning "mvn package". Error: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile)…
Suraj Nayak
  • 907
  • 1
  • 8
  • 24
3
votes
1 answer

Json parse with elephantbird in Pig

I can't get the following data to parse in Pig. It's what the twitter API returns after getting all tweets from a certain user. source data: (I removed some numbers to not invade on anyone's privacy by accident) [{"created_at":"Sat Nov 01 23:15:45…
Havnar
  • 2,558
  • 7
  • 33
  • 62
3
votes
1 answer

how to use rcfilepigstorage in pig

I want to load a text file into pig and then store it as rc file for this I found that twitter has provided a storage udf in this link…
user1208943
2
votes
0 answers

How to use Hive 2.4.6 to read protobuf encoded sequence file

I have a sequence file that value is proto3 encoded byte array. I looked into elephant-bird, which is very old and only support proto 2.x version. https://github.com/kevinweil/elephant-bird Also it stops releasing new package and the latest one is…
2
votes
1 answer

Apache pig / Twitter elephant bird Json parser ClassCastException

I'm trying to parse a rather simple json file using Pig and the Twitter's elephant-bird library, but it turns into a very painfull debugging process. The json has the following structure: oid_id: (oid:chararray), bookmarks: {( …
dams
  • 309
  • 1
  • 4
  • 13
2
votes
0 answers

elephantbird registered still showing error 2998

grunt> register '/home/piyush/Desktop/pro/json-simple-1.1.1.jar' grunt> register '/home/piyush/Desktop/pro/elephant-bird-pig-4.1.jar' grunt> register '/home/piyush/Desktop/pro/elephant-bird-hadoop-compat-4.1.jar' grunt> register…
piyush-balwani
  • 524
  • 3
  • 15
2
votes
0 answers

Pig: parse bytearray as a string/json

I have some json data format saved to S3 in SequenceFile format by secor. I want to analyze it using Pig. Using elephant-bird I managed to get it from S3 in bytearray format, but I wasn't able to convert it to chararray, which is apparently needed…
Valentin Golev
  • 9,965
  • 10
  • 60
  • 84
2
votes
1 answer

Use elephant-bird with hive to read protobuf data

I have a similar problem like this one The followning are what I used: CDH4.4 (hive 0.10) protobuf-java-.2.4.1.jar elephant-bird-hive-4.6-SNAPSHOT.jar elephant-bird-core-4.6-SNAPSHOT.jar elephant-bird-hadoop-compat-4.6-SNAPSHOT.jar The jar file…
Arthur
  • 146
  • 2
  • 8
2
votes
1 answer

How do I split in Pig a tuple of many maps into different rows

I have a relation in Pig that looks like this: ([account_id#100, timestamp#1434, id#900], [account_id#100, timestamp#1434, id#901], [account_id#100, timestamp#1434, id#902]) As you can see, I have three map objects within a tuple. All of…
Elias H
  • 251
  • 1
  • 2
  • 6
2
votes
2 answers

How do I query a nested json after loading it with elephant bird

I'm pretty new to HADOOP and pig . So . I have a single line json files , all have the same schema : {"name":"someName","pkg":[{"F1":"abc","F2":"44","F3":"xyz","F4":1024,"info": [{"timestamp":1372631550000,"value":"122","id":"nnn","name":"ppp"},…
Rotem Slootzky
  • 688
  • 10
  • 21
1
vote
1 answer

elephant bird does not exist error while loading json data in pig 0.16

Can anyone help me figure out why i am getting error while using REGISTER to register the jar file 'elephant bird' to load json data: I work in the local mode of the pig 0.16 and get the…
1
vote
1 answer

Using protobuf 3 with Hive and Elephant-Bird

I have a data pipeline that writes protobufs into an HDFS and now I need a way to query that data. I stumbled upon elephant-bird and hive and have been trying to get this solution up-an-running for a day now. Here are the steps that I took: 1.)…
Dennis Jansky
  • 107
  • 12
1
vote
1 answer

Parsing complex nested JSON in Pig

I want to parse a Billionaires JSON dataset into Pig.The JSON file can be found here. Here is what each entry has: { "wealth": { "worth in billions": 1.2, "how": { "category": "Resource Related", "from…
Karup
  • 2,024
  • 3
  • 22
  • 48
1
2 3 4