0

I have 30 csv files. I want to give it as input in for loop, in pandas?

Each file has names such as fileaa, fileab,fileac,filead,....

I have multiple input files and And i would like to receive one output.

Usually i use read_csv but due to memory error, 'read_csv' doesn't work.

f = "./file.csv"
df = pd.read_csv(f, sep="/", header=0, dtype=str)

So i would like to try parallel processing in python 2.7

Bonifacio2
  • 3,405
  • 6
  • 34
  • 54

1 Answers1

0

You might want to have a look at dask.

Dask docs show a demo on how to read in many csv files and output a single dask dataframe:

import dask.dataframe as dd
df = dd.read_csv('*.csv')

And then MANY (but not all) of the pandas methods are available, i.e.:

df.head()

It would be useful to read more on dask dataframe to understand difference with pandas dataframe

Primer
  • 10,092
  • 5
  • 43
  • 55
  • i get this warning :/ RuntimeWarning: Couldn't detect a suitable IP address for reaching '8.8.8.8', defaulting to '127.0.0.1': [Errno 101] Network is unreachable % (host, default, e), RuntimeWarning) –  Jul 09 '17 at 13:48
  • i get error for something that was working in pandas ValueError: Metadata inference failed in `apply`. Original error is below: ------------------------ ValueError('could not convert string to float: foo', u'occurred at index a') –  Jul 09 '17 at 13:52
  • If you find dask as suitable tool to handle the task at hand then I would suggest asking a separate question addressing specific problem you are having with dask. Also note that having [MCVE](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples#30424537) in your pandas/dask related question will increase the speed of getting helpful feedback. – Primer Jul 10 '17 at 07:19