How Can use pyspark in zeppelin for HIVE data

Asked Oct 14 '17 at 00:23

Active Oct 14 '17 at 00:23

Viewed 1,196 times

I am learning Hadoop environment and sorry if these are such silly questions!

I stored data(Kaggle Outbrain click prediction) to HIVE, and I used RDD. Then I want to use Zeppelin spark2.pyspark. to use python functions.

When I call data with %jdbc(hive) I can see it.

My questions are;

How can I make a dataframe to play on the zeppelin or Do I have to create a dataframe?

How can I start python analysis part? If I make any changing will affect HIVE data?

asked Oct 14 '17 at 00:23

Axis

1

Hi, you can connect hive through spark / pyspark. Please refer https://stackoverflow.com/questions/36051091/query-hive-table-in-pyspark Then, you can create tempTable or RDD for spark – 1ambda Oct 14 '17 at 04:45

0 Answers0