Accessing already present table in Hive

Question

I have created a book_crossing_dataset database in hive and created 3 table in it.

1) bx_books 2) bx_books_ratings 3) bx_user

like below

create database book_crossing_dataset;
use book_crossing_dataset;
add jar /home/cloudera/Downloads/ccsv-serde-0.9.1.jar;

create external table stage_bx_user(
  User_ID int,
  Location string,
  Age int
)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
"separatorChar" = "\;",
"quoteChar" = "\"")
stored as textfile
tblproperties ("skip.header.line.count"="1");

load data local inpath "/home/cloudera/workspace/BX-CSV-Dump/BX-Users.csv" into table stage_bx_user;

create external table bx_user(
 User_ID int,
 Location string,
 Age int
)
stored as parquet;

 insert into table bx_user select * from stage_bx_user;

Now I want to query this table from spark but when i am using below code

from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.sql import HiveContext


conf = SparkConf().setAppName("Book Crossing")

sc = SparkContext(conf=conf)

hc = HiveContext(sc)

books = hc.sql("show databases")

print(books.show())

only default database is showing there.

I am using below link as reference Query HIVE table in pyspark

You've mentioned that you need to create `book_crossing_dataset` database, but I can't see this database mentioned anywhere in your code, except at your problem description. If it doesn't exist, why do you expect to see it in the `show databases` query? — Richard Nemeth, Aug 11 '19 at 08:09
@RichardNemeth added that line in code too. I ran database and table creation command through hive shell. — Ayush Goyal, Aug 11 '19 at 08:13

score 1 · Accepted Answer · answered Aug 11 '19 at 08:19

1

You have a call to create the database, but you are never using it in the create table call. I'd suggest that your first 3 lines of the script to be changed to

create database if not exists book_crossing_dataset;
use book_crossing_dataset;
add jar /home/cloudera/Downloads/ccsv-serde-0.9.1.jar;

If this doesn't help, then the issue lies in Spark configuration. I'd suggest to try via SparkSession with Hive support enabled:

import pyspark

spark = pyspark.sql.SparkSession.builder. \
        appName("Book Crossing").enableHiveSupport().getOrCreate()

spark.sql("show databases").show()

answered Aug 11 '19 at 08:19

Richard Nemeth

1,784
1
6
16

I tried that too still getting same result. How to resolve Spark Configuration? – Ayush Goyal Aug 11 '19 at 08:25
Try adding the hive metastore config to the builder `builder.config("hive.metastore.uris", "YOUR_METASTORE_URL")`. – Richard Nemeth Aug 11 '19 at 08:35

Accessing already present table in Hive

1 Answers1