Taking up CSV on a data frame . Using pyspark

Question

I have to read a file which is in the HDFS and convert it to a data frame . I am doing the below steps. But unable to go ahead. Need some help.

from pyspark.sql import SparkSession
stock1 = spark.read.csv("/FileStore/tables/stockdata/companylist_noheader.csv")

When I do so I get the below output

The output

But the actual csv file is like below The input

Please suggest. I know we have a | delimited but when I use a map function I get the below error attributeError: 'DataFrame' object has no attribute 'map'

you need to specify the delimiter while reading. `sep='|'`. Please read the docs. — philantrovert, Jan 31 '18 at 11:41
Okay thanks a lot for your replies . I was able to achieve this by doing the below — Raghunandan Sk, Jan 31 '18 at 11:56
stock2 = spark.read.option("header", "true").option("delimiter", "|").csv("/FileStore/tables/stockdata/companylist_noheader.csv") — Raghunandan Sk, Jan 31 '18 at 11:56

score 0 · Accepted Answer · answered Jan 31 '18 at 11:54

once you get your DataFrame convert in to RDD and then use map transformation.

You can't map a DataFrame, but you can convert the DataFrame to an RDD . map that by doing yourdf.rdd.map(....)

that's the reason you are encountering

attributeError: 'DataFrame' object has no attribute 'map'

Taking up CSV on a data frame . Using pyspark

1 Answers1