0

I have to read a file which is in the HDFS and convert it to a data frame . I am doing the below steps. But unable to go ahead. Need some help.

from pyspark.sql import SparkSession
stock1 = spark.read.csv("/FileStore/tables/stockdata/companylist_noheader.csv")

When I do so I get the below output

The output

But the actual csv file is like below The input

Please suggest. I know we have a | delimited but when I use a map function I get the below error attributeError: 'DataFrame' object has no attribute 'map'

Raghunandan Sk
  • 308
  • 1
  • 3
  • 10

1 Answers1

0

once you get your DataFrame convert in to RDD and then use map transformation.

You can't map a DataFrame, but you can convert the DataFrame to an RDD . map that by doing yourdf.rdd.map(....)

that's the reason you are encountering

attributeError: 'DataFrame' object has no attribute 'map'
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121