Inserting and updating python Dataframe in an iteration

Question

I have made a script to check validity of some files, and write to a data frame as out put listing name of valid and in vaild with comments.

but when i run i am not able to insert all the files names to data frame, only the first file name is updated

my code is as follows:

file_path = 'C:\file'
f =next(os.walk(file_path))[2]
df = pd.DataFrame(index=range(1,len(f)) ,columns = ['Valid Files','In Valid Files','Comments'])

for file in f:
    filename = file
    file_name = '%s'%file_path+'\\'+'%s'%file
    try:
        parsefile('%s'%file_name)
        df["Valid Files"]= '%s'%filename
        df["Comments"] = '--'
    except Exception, e:
        df["In Valid Files"]= '%s'%filename
        df["Comments"]= e
df

My output is

Valid Files    In Valid Files   Comments
1   Testa.xml     Test_f.xml    error1
2   Testa.xml     Test_f.xml    error1
3   Testa.xml     Test_f.xml    error1

But my expectation is something like this

Valid Files    Comments  In Valid Files  Comments
1   Testa.xml     --        Test_f.xml   error1
2   Testb.xml     --        Test_h.xml   error2
3   Testc.xml     --        Test_k.xml   error3

expecting improvements and suggestions. Thanks in advance.

Something like this https://stackoverflow.com/questions/12555323/adding-new-column-to-existing-dataframe-in-python-pandas? — Nabin, Nov 21 '17 at 11:46
This seems to be just adding a new column in df with a list of values... In here the thought is to dynamically update the rows in df with new populated values. — , Nov 21 '17 at 11:53

Space Impact · Accepted Answer · 2017-11-21T12:33:06.130

You are overwriting the Valid Files column Comments with In Valid Files column Comments.

Try this :

for file in f:
    filename = file
    file_name = '%s'%file_path+'\\'+'%s'%file
    try:
        parsefile('%s'%file_name)
        df["Valid Files"]= '%s'%filename
        df["Valid Files Comments"] = '--'
    except Exception, e:
        df["In Valid Files"]= '%s'%filename
        df["In Valid Files Comments"]= e

I don't have permission to just add comment. If it didn't work comment I will delete the answer.

If you write directly to a column in for loop it will update only the last value. So I suggest you to create a list and append the values then use the list to make dictionary with key names as you wrote then make a dataframe. The process is lengthy I don't know the simplified solution to it.

like:

Valid_Files = []
Valid_Files_Comments = []
In_Valid_Files = []
In_Valid_Files_Comments = []
for file in f:
    filename = file
    file_name = '%s'%file_path+'\\'+'%s'%file
    try:
        parsefile('%s'%file_name)
        Valid_Files.append('%s'%filename)
        Valid_Files_Comments.append('--')
    except Exception, e:
        In_Valid_Files.append('%s'%filename)
        In_Valid_Files_Comments.append('e')
df = pd.DataFrame({'Valid Files':Valid_Files,'Valid Files Comments':Valid_Files_Comments,'In Valid Files':In_Valid_Files,'In Valid Files Comments':In_Valid_Files_Comments})

This will give you the desired output.

It actually solved the overwriting issue, but still only updating first file name, its not updating the others (only taking the first occurrence value not able to irate) — , Nov 21 '17 at 12:01
Thanks much! I already tried it but my functionality is getting lag due to that long process. — , Nov 21 '17 at 12:34

Inserting and updating python Dataframe in an iteration

1 Answers1