0

I have made a script to check validity of some files, and write to a data frame as out put listing name of valid and in vaild with comments.

but when i run i am not able to insert all the files names to data frame, only the first file name is updated

my code is as follows:

file_path = 'C:\file'
f =next(os.walk(file_path))[2]
df = pd.DataFrame(index=range(1,len(f)) ,columns = ['Valid Files','In Valid Files','Comments'])

for file in f:
    filename = file
    file_name = '%s'%file_path+'\\'+'%s'%file
    try:
        parsefile('%s'%file_name)
        df["Valid Files"]= '%s'%filename
        df["Comments"] = '--'
    except Exception, e:
        df["In Valid Files"]= '%s'%filename
        df["Comments"]= e
df  

My output is

Valid Files    In Valid Files   Comments
1   Testa.xml     Test_f.xml    error1
2   Testa.xml     Test_f.xml    error1
3   Testa.xml     Test_f.xml    error1

But my expectation is something like this

Valid Files    Comments  In Valid Files  Comments
1   Testa.xml     --        Test_f.xml   error1
2   Testb.xml     --        Test_h.xml   error2
3   Testc.xml     --        Test_k.xml   error3

expecting improvements and suggestions. Thanks in advance.

  • Something like this https://stackoverflow.com/questions/12555323/adding-new-column-to-existing-dataframe-in-python-pandas? – Nabin Nov 21 '17 at 11:46
  • This seems to be just adding a new column in df with a list of values... In here the thought is to dynamically update the rows in df with new populated values. –  Nov 21 '17 at 11:53

1 Answers1

1

You are overwriting the Valid Files column Comments with In Valid Files column Comments.

Try this :

for file in f:
    filename = file
    file_name = '%s'%file_path+'\\'+'%s'%file
    try:
        parsefile('%s'%file_name)
        df["Valid Files"]= '%s'%filename
        df["Valid Files Comments"] = '--'
    except Exception, e:
        df["In Valid Files"]= '%s'%filename
        df["In Valid Files Comments"]= e

I don't have permission to just add comment. If it didn't work comment I will delete the answer.

If you write directly to a column in for loop it will update only the last value. So I suggest you to create a list and append the values then use the list to make dictionary with key names as you wrote then make a dataframe. The process is lengthy I don't know the simplified solution to it.

like:

Valid_Files = []
Valid_Files_Comments = []
In_Valid_Files = []
In_Valid_Files_Comments = []
for file in f:
    filename = file
    file_name = '%s'%file_path+'\\'+'%s'%file
    try:
        parsefile('%s'%file_name)
        Valid_Files.append('%s'%filename)
        Valid_Files_Comments.append('--')
    except Exception, e:
        In_Valid_Files.append('%s'%filename)
        In_Valid_Files_Comments.append('e')
df = pd.DataFrame({'Valid Files':Valid_Files,'Valid Files Comments':Valid_Files_Comments,'In Valid Files':In_Valid_Files,'In Valid Files Comments':In_Valid_Files_Comments})

This will give you the desired output.

Space Impact
  • 13,085
  • 23
  • 48
  • It actually solved the overwriting issue, but still only updating first file name, its not updating the others (only taking the first occurrence value not able to irate) –  Nov 21 '17 at 12:01
  • Thanks much! I already tried it but my functionality is getting lag due to that long process. –  Nov 21 '17 at 12:34