1

I want to read an RDD with header. I found similar question here, but it's not working for me. How do I skip a header from CSV files in Spark?

rdd.mapPartitionsWithIndex { (idx, iter) => if (idx == 0) iter.drop(1)

else iter }

so I tried

def f(idx, iter): 
    if idx==0:
        iter.drop(1)
    else:
        yield list(iterator)
rdd2 = rdd.mapPartitionsWithIndex(f)

but it says AttributeError: 'generator' object has no attribute 'drop'

any help?

Yong Hyun Kwon
  • 359
  • 1
  • 3
  • 15

1 Answers1

0

Try something like this:

def f(idx, iter):
    output=[]
    for sublist in iter:
        output.append(sublist)
    if idx>0:
        return(output)
    else:
        return(output[1:])
ags29
  • 2,621
  • 1
  • 8
  • 14