0

I wanted to Convert scala dataframe into pandas data frame

    val collection = spark.read.sqlDB(config)
    collection.show()

    #Should be like df=collection
Gaurav Gangwar
  • 467
  • 3
  • 11
  • 24
  • 1
    It would be easier if you use pyspark: https://stackoverflow.com/questions/50958721/convert-a-spark-dataframe-to-pandas-df – Shaido Aug 05 '19 at 08:03

2 Answers2

1

You are asking for a way of using a Python library from Scala. This is a bit weird to me. Are you sure you have to do that? Maybe you know that, but Scala DataFrames have a good API that will probably give you the functionality you need from pandas.

If you still need to use pandas, I would suggest you to write the data that you need to a file (a csv, for example). Then, using a Python application you can load that file into a pandas dataframe and work from there.

Trying to create a pandas object from Scala is probably overcomplicating things (and I am not sure it is currently possible).

Selnay
  • 699
  • 4
  • 16
0

I think If you want to use pandas based API in SPARK code, then you can install Koalas-Python library. So, Whatever the function you want to use from pandas API directly you can embed them in SPARK code.

To install kolas

pip install koalas
Ravi
  • 424
  • 3
  • 13
  • I think here ```collection``` variable is ```dataframe```. ```toPandas()``` def would be there. If you apply the ```toPandas()``` function it will return pandas based data frame. This link will give more information about installing ```kolas``` and how to use It. https://medium.com/future-vision/databricks-koalas-python-pandas-for-spark-ce20fc8a7d08 – Ravi Aug 05 '19 at 07:27
  • 1
    Not related to my question. – Gaurav Gangwar Aug 05 '19 at 09:50