1

I am learning Hadoop environment and sorry if these are such silly questions!

I stored data(Kaggle Outbrain click prediction) to HIVE, and I used RDD. Then I want to use Zeppelin spark2.pyspark. to use python functions.

When I call data with %jdbc(hive) I can see it.

My questions are;

How can I make a dataframe to play on the zeppelin or Do I have to create a dataframe?

How can I start python analysis part? If I make any changing will affect HIVE data?

Axis
  • 2,066
  • 2
  • 21
  • 40
  • 1
    Hi, you can connect hive through spark / pyspark. Please refer https://stackoverflow.com/questions/36051091/query-hive-table-in-pyspark Then, you can create tempTable or RDD for spark – 1ambda Oct 14 '17 at 04:45

0 Answers0