2

Hi I have dataframe which is having only columns. There is no data for columns. But I am trying to save into file, no header is saving. File is totally blank.

Example:

df.show()

+-----+----------------------+-------+---------------------+------------------------+----------------------------+--------------------------+----------------------+---------------+------------------------+-------------+-----------------+-----------------------+--------------+---------------+-----------+-----------------+-----------+------+--------+----------------+----------------------+--------------+-----+-------+---------+------+--------+
|owner|account_priority_score|account|call_objective_clm_id|call_objective_from_date|call_objective_on_by_default|call_objective_record_type|call_objective_to_date|display_dismiss|display_mark_as_complete|display_score|email_template_id|email_template_vault_id|email_template|expiration_date|no_homepage|planned_call_date|posted_date|reason|priority|record_type_name|suggestion_external_id|supress_reason|title|product|survey_id|groups|insrt_dt|
+-----+----------------------+-------+---------------------+------------------------+----------------------------+--------------------------+----------------------+---------------+------------------------+-------------+-----------------+-----------------------+--------------+---------------+-----------+-----------------+-----------+------+--------+----------------+----------------------+--------------+-----+-------+---------+------+--------+
+-----+----------------------+-------+---------------------+------------------------+----------------------------+--------------------------+----------------------+---------------+------------------------+-------------+-----------------+-----------------------+--------------+---------------+-----------+-----------------+-----------+------+--------+----------------+----------------------+--------------+-----+-------+---------+------+--------+

But while saving into file headers are not coming. I am using below code-

df.coalesce(1).write.mode('overwrite').csv(output_path, sep=output_delimiter,quote='',escape='\"', header='True', nullValue=None)
Shivika
  • 209
  • 3
  • 15

2 Answers2

0

To save an empty PySpark DataFrame with a header into a CSV file, you can follow the below steps:

  1. Create an empty PySpark DataFrame with the desired schema and header using createDataFrame method:
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

schema = StructType([StructField("name", StringType(), True), StructField("age", IntegerType(), True)])

df = spark.createDataFrame([], schema)
  1. Write the DataFrame to a CSV file using the write method with the header option set to True:
df.write.option("header", "true").csv("path/to/save/csv/file")

This will create a CSV file with the header and the schema defined in the StructType object, but with no data in it.

DaveL17
  • 1,673
  • 7
  • 24
  • 38
-1

To do what you are asking you will have to define a schema.

So for example:

schema = StructType([ \
    StructField("firstname",StringType(),True), \
    StructField("middlename",StringType(),True), \
    StructField("lastname",StringType(),True), \
    StructField("id", StringType(), True), \
    StructField("gender", StringType(), True), \
    StructField("salary", IntegerType(), True) \
  ])

df = spark.createDataFrame([],schema=schema)
df.coalesce(1).write.csv("/tmp/csv_data/", header=True)

this will output single csv file with just the headers.

Benny Elgazar
  • 243
  • 2
  • 9