How to show my existing column name instead '_c0', '_c1', '_c2', '_c3', '_c4' in first row?

Question

Data frame showing _c0,_c1 instead my original column names in first row.
i want to show My column name which is on first row of my CSV.

    dff = 
    spark.read.csv("abfss://dir@acname.dfs.core.windows.net/
    diabetes.csv")
    dff:pyspark.sql.dataframe.DataFrame
    _c0:string
    _c1:string
    _c2:string
    _c3:string
    _c4:string
    _c5:string
    _c6:string
    _c7:string
    _c8:string

Possible duplicate of [Load CSV file with Spark](https://stackoverflow.com/questions/28782940/load-csv-file-with-spark) (the most upvoted answer, not the accepted one) — pault, Aug 01 '19 at 12:48

score 9 · Accepted Answer · answered Aug 02 '19 at 00:52

9

Very simple solution is to have a header=True while you read the file:

dff = spark.read.csv("abfss://dir@acname.dfs.core.windows.net/diabetes.csv", header=True)

answered Aug 02 '19 at 00:52

Kishan Vyas

126
2

If headers are indeed blank I want it to remain blank instead of appending it with _c0 etc., do we have any fix for this? – Scope Jan 05 '22 at 13:45

score 2 · Answer 2 · answered Mar 13 '20 at 23:49

2

Set header as true while loading the CSV file.

spark.read.format("csv")
                   .option("delimiter", ",")
                   .option("header", "true")
                   .option("inferSchema", "true")
                   .load("file.csv")

answered Mar 13 '20 at 23:49

Aman Sehgal

546
4
13

score -1 · Answer 3 · edited Aug 02 '19 at 03:18

-1

I Just Sorted By below code

    .select(col("_c0").alias("A"),
             col("_c1").alias("B"),
             col("_c2").alias("C"),
             col("_c3").alias("D"),
             col("_c4").alias("E")

            )

edited Aug 02 '19 at 03:18

zmag

7,825
12
32
42

answered Aug 01 '19 at 13:21

Gaurav Gangwar

467
3
11
24

I wouldn't recommend doing it this way because then the first row in your dataframe would contain the header – pault Aug 01 '19 at 13:59
Yeah you are right i just put "Header=true" that work for me thanks for your help. – Gaurav Gangwar Aug 02 '19 at 07:28
1

You should delete this answer and accept the duplicate target – pault Aug 02 '19 at 10:45

How to show my existing column name instead '_c0', '_c1', '_c2', '_c3', '_c4' in first row?

3 Answers3