I am learning Hadoop environment and sorry if these are such silly questions!
I stored data(Kaggle Outbrain click prediction) to HIVE
, and I used RDD.
Then I want to use Zeppelin spark2.pyspark.
to use python functions.
When I call data with %jdbc(hive)
I can see it.
My questions are;
How can I make a dataframe to play on the zeppelin or Do I have to create a dataframe?
How can I start python analysis part? If I make any changing will affect HIVE data?