0

I am new to python and I am facing problem in creating a Dataframe using pandas:

import pandas as pd

df = spark.createDataFrame([(66, "a", "4"), 
                            (67, "a", "0"), 
                            (70, "b", "4"), 
                            (71, "d", "4")],
                            ("id", "code", "amt"))


dfa = pd.DataFrame(data=df)

This is the error I am getting ValueError: DataFrame constructor not properly called!

smci
  • 32,567
  • 20
  • 113
  • 146
Alastair
  • 17
  • 6
  • 2
    Do you have to start with spark?. You can pass that list to the dataframe constructor directly – Paul H Feb 24 '21 at 03:39
  • 2
    You also need to be more specific than "i am facing a problem". What kind of problem? Are the data in the wrong order? Is an error raised? – Paul H Feb 24 '21 at 03:41
  • i am getting error ValueError: DataFrame constructor not properly called! – Alastair Feb 24 '21 at 03:41
  • 1
    OK, so the dataframe constructor was not properly called. Did you read the docs on how to call it? What wasn't clear about those docs? – Paul H Feb 24 '21 at 03:41
  • I tried this https://stackoverflow.com/questions/25604115/dataframe-constructor-not-properly-called-error. but it didnt worked – Alastair Feb 24 '21 at 03:43
  • 2
    But did you look at the official documentation? it has *many* example of creating dataframes – Paul H Feb 24 '21 at 03:46
  • 1
    That's not a pandas dataframe, it's a PySpark one. And please add the missing 'import' for spark – smci Feb 24 '21 at 03:49

2 Answers2

1
dfa = df.select("*").toPandas()

see https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html#enabling-for-conversion-tofrom-pandas

Paul H
  • 65,268
  • 20
  • 159
  • 136
Apollon16
  • 36
  • 3
0

What is the error message? Have you imported spark or just pandas? Why are you using spark instead of pandas for creating the dataframe, just go as

pd.DataFrame([(66, "a", "4"), 
              (67, "a", "0"), 
              (70, "b", "4"), 
              (71, "d", "4")], columns=("id", "code", "amt"))

And it will make a dataframe for you.

    id      code    amt
0   66      a       4
1   67      a       0
2   70      b       4
3   71      d       4
delimiter
  • 745
  • 4
  • 13