I have created a book_crossing_dataset database in hive and created 3 table in it.
1) bx_books 2) bx_books_ratings 3) bx_user
like below
create database book_crossing_dataset;
use book_crossing_dataset;
add jar /home/cloudera/Downloads/ccsv-serde-0.9.1.jar;
create external table stage_bx_user(
User_ID int,
Location string,
Age int
)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
"separatorChar" = "\;",
"quoteChar" = "\"")
stored as textfile
tblproperties ("skip.header.line.count"="1");
load data local inpath "/home/cloudera/workspace/BX-CSV-Dump/BX-Users.csv" into table stage_bx_user;
create external table bx_user(
User_ID int,
Location string,
Age int
)
stored as parquet;
insert into table bx_user select * from stage_bx_user;
Now I want to query this table from spark but when i am using below code
from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.sql import HiveContext
conf = SparkConf().setAppName("Book Crossing")
sc = SparkContext(conf=conf)
hc = HiveContext(sc)
books = hc.sql("show databases")
print(books.show())
only default database is showing there.
I am using below link as reference Query HIVE table in pyspark