I have 10,000 files that I am processing and having troubles getting a predefined function to be called. Here is my code:
def process_labs (labs):
lab1 = labs.map(lambda x: x.split('labIDs:'))
return lab1
files = sc.wholeTextFiles ('file:///data/lab-records/*/*/*')
labs = files.map(lambda x: x[1])
lab_records = labs.map(lambda x: process_labs(x))
Note that I am just working with the contents of the files and I leave off the file name (second line).
The code below calls the function with no problems and lab data is passed to the function just fine, so there is data in the files. The problem is that the code does not include the map, so there is only one call to the process_labs() with one file processed.
lab_records = process_labs(labs)
Can you help me with the syntax so the function is called with the map(), so it can process the 10,000 files?
Thanks for the that post explains, as it explains a lot about map() but it does not explain how you can call a function like I am doing. Is there another way I can use map to call a predefined function to process 10,000 files?