Different file name when saving in loop python/pandas

Question

I use the following lines of code to loop over different files in a folder:

import os

files_in_folder_1 = [os.path.join(path1, f) for f in os.listdir(path1) if os.path.isfile(os.path.join(path1, f))]

files_in_folder_2 = [os.path.join(path2, f) for f in os.listdir(path2) if os.path.isfile(os.path.join(path2, f))]

for file1, file2 in zip(files_in_folder_1, files_in_folder_2):
    with open(file1) as f1, open(file2) as f2:

        dftask = pd.read_csv(file2)
        dfresource = pd.read_csv(file1)

At the end of all operations I want to save the files in another directory with the same filename. However how should I do that? I use this:

dftask.to_csv(r'path\file1.csv')
dfresource.to_csv(r'path\file2.csv')

However when using this line of code the csv. file is constantly overwritten inside the loop over all files.

What is the solution?

Padraic Cunningham · Accepted Answer · 2015-08-03T10:52:13.323

2

os.path.basename will give you the file name just join it to the new path and save it however you want:

new_dir = "path/to/dir/"
for file1, file2 in zip(files_in_folder_1, files_in_folder_2):
    dftask = pd.read_csv(file2)
    dfresource = pd.read_csv(file1)
    # work on df's  .......

    # save to new dir   
    dftask.to_csv(os.path.join(new_dir,os.path.basename(file2)))
    dfresource.to_csv(os.path.join(new_dir,os.path.basename(file1)))

If you are using file.open to open the files first you can get the name from the .name attribute:

new_dir = "path/to/dir/"
for file1, file2 in zip(files_in_folder_1, files_in_folder_2):
    with open(file1) as f1, open(file2) as f2:
        dftask = pd.read_csv(file2)
        dfresource = pd.read_csv(file1)

    dftask.to_csv(os.path.join(new_dir, file2.name))
    dfresource.to_csv(os.path.join(new_dir,file1.name))

edited Aug 03 '15 at 10:52

answered Aug 03 '15 at 10:29

Padraic Cunningham

176,452
29
245
321

When I try this I get the following error: IOError: [Errno 22] invalid mode ('w') or filename – F1990 Aug 03 '15 at 10:33
This line: dftask.to_csv(os.path.join(new_dir,os.path.basename(file2))) – F1990 Aug 03 '15 at 10:38
filename: "r'path\\Tasks-Job--16kvZ-Feike_15min_data-201728.csv" – F1990 Aug 03 '15 at 10:39
why is there an `r` there? Anyway what does `print os.path.join(new_dir,os.path.basename(file2)` output? – Padraic Cunningham Aug 03 '15 at 10:40
I used this same 'r before when pointing to a directory, so I thaught it was necessary. But I tried it without the 'r and it returns the same error – F1990 Aug 03 '15 at 10:42
http://stackoverflow.com/questions/19034822/unknown-python-expression-filename-r-path-to-file – dermen Aug 03 '15 at 10:42
You don't need `r`, os.path.join takes care of that – Padraic Cunningham Aug 03 '15 at 10:43
Print gives: D:-path\Tasks-Job--16kvxZ-Feike_15min_data-20150728.csv – F1990 Aug 03 '15 at 10:43
you should try putting the full path in, just to be sure. You are probably trying to create a file in a driectory that does not exist, at least in reference to where the code is executing. Try ```new_dir = 'C:\Users\whoever\whatever\path'``` – dermen Aug 03 '15 at 10:44
and what is in `D:-path`? – Padraic Cunningham Aug 03 '15 at 10:48
@PadraicCunningham that is weird indeed, however in the code I type D:\ path so I do not understand that it prints this D:-path – F1990 Aug 03 '15 at 11:28

score 0 · Answer 2 · answered Aug 03 '15 at 10:10

0

As far as I understand your question you can use 'path\' + file1.name.split("/")[-1] to save all files with original names.

answered Aug 03 '15 at 10:10

zuku

649
8
24

score 0 · Answer 3 · answered Aug 03 '15 at 10:11

With the above code when the loop exits file1 and file2 each will contain individual DataFrame from last iteration of the loop.

If you want to consolidate all the DataFrames should create list containing each of the individual DataFrames and concat them.

import os

files_in_folder_1 = [os.path.join(path1, f) for f in os.listdir(path1) if os.path.isfile(os.path.join(path1, f))]

files_in_folder_2 = [os.path.join(path2, f) for f in os.listdir(path2) if os.path.isfile(os.path.join(path2, f))]

dftask_list = []
dfresource_list = []
for file1, file2 in zip(files_in_folder_1, files_in_folder_2):
    with open(file1) as f1, open(file2) as f2:

        dftask_list.append(pd.read_csv(file2))
        dfresource_list.append(pd.read_csv(file1))

dftask = pd.concat(dftask_list)
dfresource = pd.concat(dfresource_list)

Note: You may need to reset index after this.

dftask = dftask.reset_index(drop=True)
dfresource = dfresource.reset_index(drop=True)

score 0 · Answer 4 · answered Aug 03 '15 at 10:14

You can use os.path.split() , the function returns a tuple, where the second element would be the filename . Example -

f1name = os.path.split(file1)[1]
f2name = os.path.split(file2)[1]

Then you can use os.path.join() to join it with the other directory and get the resultant path. Example -

file1newpath = os.path.join(otherdir, os.path.split(file1)[1])
file2newpath = os.path.join(otherdir, os.path.split(file2)[1])

Then you can use the above names to save the file -

dftask.to_csv(file1newpath)
dfresource.to_csv(file2newpath)

Demo for os.path.split() -

>>> import os.path
>>> os.path.split(r'C:\Users\temp\somedir\somefile.csv')[1]
'somefile.csv'

Different file name when saving in loop python/pandas

4 Answers4