0

I created a table in Hive and the loaded the data from external csv file. When I try to print the data from python, I get output like "['\x00"\x00m\x00e\x00s\x00s\x00a\x00g\x00e\x00"\x00']". When I query Hive GUI, the results are proper. Please tell me how can I get the same results via python program.

My python code :

import pyhs2

with pyhs2.connect(host='192.168.56.101',
               port=10000,
               authMechanism='PLAIN',
               user='hiveuser',
               password='password',
               database='anuvrat') as conn:
with conn.cursor() as cur:
    cur.execute('SELECT message FROM ABC_NEWS LIMIT 5')

    print cur.fetchone()

Output is :

/usr/bin/python2.7 /home/anuvrattiku/SPRING_2017/CMPE239/Facebook_Fake_news_detection/code_fake_news/code.py
['\x00"\x00m\x00e\x00s\x00s\x00a\x00g\x00e\x00"\x00']

Process finished with exit code 0

When I query the same table in Hive, I get the following output :

enter image description here

This is how I created the table :

CREATE TABLE ABC_NEWS(
ID STRING, 
PAGE_ID INT, 
NAME STRING, 
MESSAGE STRING, 
DESCRIPTION STRING, 
CAPTION STRING, 
POST_TYPE STRING, 
STATUS_TYPE STRING, 
LIKES_COUNT SMALLINT, 
COMMENTS SMALLINT, 
SHARES_COUNT SMALLINT, 
LOVE_COUNT SMALLINT, 
WOW_COUNT SMALLINT, 
HAHA_COUNT SMALLINT, 
SAD_COUNT SMALLINT, 
THANKFUL_COUNT SMALLINT, 
ANGRY_COUNT SMALLINT, 
LINK STRING, 
IMAGE_LINK STRING, 
POSTED_AT STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ESCAPED BY '\\';

The csv file for loading the table is in the path below : https://www.dropbox.com/s/fiwygyqt8u9eo5s/abc-news-86680728811.csv?dl=0

David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
Anuvrat Tiku
  • 1,616
  • 2
  • 17
  • 25

1 Answers1

0
  1. Since the text is qualified (") and within a qualified text there are occurrences of the delimiter character (,), you should use CSV Serde
  2. You were trying to print cur.fetchone() which is a list and not a string, therefore got an array of bytes, while you should have printed the first element of the list - cur.fetchone()[0]

create external table abc_news
(
    id              string 
   ,page_id         int 
   ,name            string 
   ,message         string 
   ,description     string 
   ,caption         string 
   ,post_type       string 
   ,status_type     string 
   ,likes_count     smallint 
   ,comments        smallint 
   ,shares_count    smallint 
   ,love_count      smallint 
   ,wow_count       smallint 
   ,haha_count      smallint 
   ,sad_count       smallint 
   ,thankful_count  smallint 
   ,angry_count     smallint 
   ,link            string 
   ,image_link      string 
   ,posted_at       string
)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties 
(
   'separatorChar' = ','
  ,'quoteChar'     = '"'
)  
stored as textfile
;

>>> import pyhs2
>>> 
>>> with pyhs2.connect(host='localhost',port=10000,authMechanism='PLAIN',user='cloudera',password='cloudera',database='local_db') as conn:
...     with conn.cursor() as cur:
...         cur.execute('SELECT message FROM ABC_NEWS LIMIT 10')
...         for i in cur.fetch():
...             print i[0]
...             
...             
... 
"message"
"Roberts took the unusual step of devoting the majority of  his annual  report to the issue of judicial ethics."
"Do you agree with the new law?"
"Some pretty cool confetti will rain down on New York City celebrators."
NULL
"The pharmacy was held up by a man seeking prescription medication. "
NULL
"There were no immediate reports of damage or injuries."
"Were you an LCD screen early adopter? A settlement may be headed your way."
"As Americans get bigger, passenger limits are becoming more restrictive."
>>> 
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88