0

enter image description hereHow can I create a dataframe of empty structs please.? Thank you .

dataxx = []
schema = StructType(
[
    StructField('Info1',
        StructType([
            StructField('fld', IntegerType(),True),
            StructField('fld1', IntegerType(),True),
            StructField('fld2', IntegerType(),True),
            StructField('fld3', IntegerType(),True),
            StructField('fld4', IntegerType(),True),   
            ])
    ),
]
)
df = sqlCtx.createDataFrame(dataxx, schema)

Thank you for your help

ceo
  • 31
  • 1
  • 2
  • 7
  • Not related to pandas..removed – anky Dec 22 '19 at 16:04
  • Have you tried `spark.createDataFrame([], schema)` ? – blackbishop Dec 22 '19 at 20:26
  • Does this answer your question? [How to create an empty DataFrame? Why "ValueError: RDD is empty"?](https://stackoverflow.com/questions/34624681/how-to-create-an-empty-dataframe-why-valueerror-rdd-is-empty) – blackbishop Dec 22 '19 at 20:26
  • @blackbishop Thank you but its not really what I mean. I want to create like this shema of data frame struct. I have added a pic to better understand. – ceo Dec 22 '19 at 20:33

1 Answers1

2

If you want to create DataFrame that has specific schema but contains no data, you can do it simply by providing empty list to the createDataFrame function:

from pyspark.sql.types import *

schema = StructType(
[
    StructField('Info1',
        StructType([
            StructField('fld', IntegerType(),True),
            StructField('fld1', IntegerType(),True),
            StructField('fld2', IntegerType(),True),
            StructField('fld3', IntegerType(),True),
            StructField('fld4', IntegerType(),True),   
            ])
    ),
]
)
df = spark.createDataFrame([], schema)

df.printSchema()

root
 |-- Info1: struct (nullable = true)
 |    |-- fld: integer (nullable = true)
 |    |-- fld1: integer (nullable = true)
 |    |-- fld2: integer (nullable = true)
 |    |-- fld3: integer (nullable = true)
 |    |-- fld4: integer (nullable = true)

Here spark is sparkSession.

David Vrba
  • 2,984
  • 12
  • 16
  • thank you David to add value in my fld2 for example can I do this please. ??df.Info1.fld2 = 22 – ceo Dec 22 '19 at 20:46
  • @ceo No, I am afraid it is not going to work like this. If you want to add value to info1.fld2 (and have a single row in the DataFrame) you can call `withColumn` transformation (or just `select`) and redefine the struct and in `fld2` use `lit(22)` – David Vrba Dec 23 '19 at 13:59