2

I am trying to store the output of if condition to a Dataframe. Given below is what I am trying:

import os

filename = "Desktop/sales/10-05-2018"
#check file exists
if(os.path.exists(filename)):
  print("Files received")
else:
  print("No files received")

Instead of printing the output, I would like to store the output to a Dataframe. Could anyone advice on this. Thanks.

jpp
  • 159,742
  • 34
  • 281
  • 339
Kevin Nash
  • 1,511
  • 3
  • 18
  • 37
  • Can you describe this dataframe a little more? Is it just a single column of those two strings? – tdelaney May 11 '18 at 17:02
  • You only have 1 filename and one true/false condition... whouldn't a single python boolean object suffice? How is the dataframe helpful? – tdelaney May 11 '18 at 17:03
  • @tdelaney, I am trying to run this script across multiple folders and check for each day to see if I have received a file from that, if so I store a true value for that day along with department from where file came in (in this case its Sales) and for days we did not receive label them as false.. The Dataframe would consist of 3 columns namely Date, Department and Boolean of 'Yes' or 'No'. – Kevin Nash May 11 '18 at 17:06
  • So, for `"Desktop/sales/10-05-2018"`, `"10-05-2018"` is the name of a file or a directory containing files? – tdelaney May 11 '18 at 17:29
  • @tdelaney, sales is the name of the directory and "10-05-2018" is the name of the sub-directory. I am trying to have each of these in a separate column in the Dataframe. – Kevin Nash May 11 '18 at 17:30

1 Answers1

1

This is one way you can store such a mapping in a dataframe.

import os, pandas as pd

df = pd.DataFrame(columns=['filename', 'exists'])

df['file'] = ['file1.csv', 'file2.csv', 'file3.csv']
df['exists'] = df['file'].map(os.path.exists)

This will create a dataframe of filenames in one column and a Boolean series in another indicating whether or not the file exists.

If the filenames are retrieved from an iterable, you should aggregate to a list of lists first before constructing a dataframe. Appending continually to an existing dataframe is inefficient in this situation.

lst = ( ... some iterable ... )
lst_of_lst = [[f, os.path.exists(f)] for f in lst]

df = pd.DataFrame(lst_of_lst, columns=['filename', 'exists'])
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Hi Jpp, thanks for your input. One novice question here would be when you mentioned aggregate list of list do you mean to specify the path where the files are available. Could you please assist on this, apologies for novice question as this. Thanks again. – Kevin Nash May 11 '18 at 17:13
  • @KevinNash, In my example `lst` is a list of file paths (including filename) which can be passed directly to `os.path.exists`. The idea is that you don't "add them to dataframe one by one" but make a list of `[filename, os.path.exists(filename)]` elements and feed this into the `pd.DataFrame` constructor. Hope that helps. – jpp May 11 '18 at 18:34
  • thanks for your input. Another question, if I would like to extract part of the file name as a new column how could I do that. I tried doing the same but it failed saying "AttributeError". Could you please advice, thanks. – Kevin Nash May 11 '18 at 18:38
  • Thanks @jpp. What I meant is if I would like to extract part of file name file1.csv, lets say store value "file" as a new column in the existing dataframe, how could I do that. But if this is beyond this ticket, thanks for your help.. – Kevin Nash May 11 '18 at 18:41
  • @KevinNash, See https://stackoverflow.com/questions/8384737/extract-file-name-from-path-no-matter-what-the-os-path-format – jpp May 11 '18 at 18:42