1

I have 10,000 files that I am processing and having troubles getting a predefined function to be called. Here is my code:

def process_labs (labs):
    lab1 = labs.map(lambda x: x.split('labIDs:'))
    return lab1


files = sc.wholeTextFiles ('file:///data/lab-records/*/*/*')
labs = files.map(lambda x: x[1])
lab_records = labs.map(lambda x: process_labs(x))

Note that I am just working with the contents of the files and I leave off the file name (second line).

The code below calls the function with no problems and lab data is passed to the function just fine, so there is data in the files. The problem is that the code does not include the map, so there is only one call to the process_labs() with one file processed.

lab_records = process_labs(labs)

Can you help me with the syntax so the function is called with the map(), so it can process the 10,000 files?

Thanks for the that post explains, as it explains a lot about map() but it does not explain how you can call a function like I am doing. Is there another way I can use map to call a predefined function to process 10,000 files?

Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96
Satish
  • 21
  • 4
  • Sorry, copying error. I made the correct in the question. – Satish Feb 25 '18 at 23:01
  • 1
    Does labs have a `map` method? What is `labs`? Are you getting any errors/exceptions? There doesn't seem to be enough information. Please read [mcve]. – wwii Feb 25 '18 at 23:03
  • There are no errors/exceptions. – Satish Feb 25 '18 at 23:05
  • Can you tell me why its a duplicate? That post does not show any examples of what I am doing. – Satish Feb 25 '18 at 23:40
  • @Satish No, it explains how the `map` function works. We're not just here to tell you *exactly* what you're doing wrong and how to fix it. Learning to solve your own problems is a key part of programming, and pointing you to the right resources should make it easier to learn where to look in future. – Nick is tired Feb 25 '18 at 23:44

0 Answers0