I am using (python's) panda's map function to process a big CSV file (~50 gigabytes), like this:
import pandas as pd
df = pd.read_csv("huge_file.csv")
df["results1"], df["results2"] = df.map(foo)
df.to_csv("output.csv")
Is there a way I can use parallelization on this? Perhaps using multiprocessing's map function?
Thanks, Jose