0

I'd like to assign unique variable name with each file from a directory. I have no idea how this can be done. I'm new to python, so I'm sorry is the code is scruffy.

def DataFinder(path, extension):
    import os
    count = 0
    extensions = ['.txt','.csv','.xls','xlsm','xlsx']
    allfiles = []

    if not extension in extensions:
        print('Can\'t read data from this file type.\n','Allowed file types are\n',str(extensions))
    else:
        #loop through the files
        for root, dirs, files in os.walk(path):
            for file in files:
                #check if the file ends with the extension
                if file.endswith(extension):
                    count+=1
                    print(str(count)+': '+file)
                    allfiles.append(file)

        if count==0:
            print('There are no files with',extension,'extension in this folder.')
    return allfiles

How this code be modified to assign variable name like df_number.of.file with each iteration as a string?

Thanks

My ultimate goal is to have set of DataFrame objects for each file under unique variable name without a need to create those variables manually.

The suggested duplicate did not answer my question, neither worked for me.

allfiles = {}
        #filter through required data extensions
        if not extension in extensions:
            print('Can\'t read data from this file type.\n','Allowed file types are\n',str(extensions))
        else:
            #loop through the files
            for root, dirs, files in os.walk(path):
                for file in files:
                    #check if the file ends with the extension
                    if file.endswith(extension):
                        #raise counter
                        count+=1
                        print(str(count)+': '+file)
                        allfiles.update({'df'+str(count) : path+file})

After adjusting the code as suggested my output was a dictionary:

{'df1': 'C:/Users/Bartek/Downloads/First.csv', 'df2': 'C:/Users/Bartek/Downloads/Second.csv', 'df3': 'C:/Users/Bartek/Downloads/Third.csv'}

I achieved similar thing previously using list:

['df_1First.csv', 'df_2Second.csv', 'df_3Third.csv']

But my exact question is how to achieve this:

for each object in dict: -create a variable with consecutive object number

so this variable(s) can be passed as data argument to pandas.DataFrame()

I know this is very bad idea (http://stupidpythonideas.blogspot.co.uk/2013/05/why-you-dont-want-to-dynamically-create.html), therefore can you please show me proper way using dict?

Many thanks

Bartek Malysz
  • 922
  • 5
  • 14
  • 37
  • 1
    Don't do it--setting the name of a value based on the value is bad programming practice. Instead, use a dictionary, where the desired name is the key. – Rory Daulton Aug 05 '17 at 16:14
  • @RoryDaulton thank you. How can I ask the right question to find an answer about using dictionary for this purpose please? – Bartek Malysz Aug 05 '17 at 16:30

2 Answers2

0

You should be able to modify this section of the code to accomplish what you desire. Instead of printing out the number of files. use count to create new unique filenames.

if file.endswith(extension):
  count+=1
  newfile = ('df_' + str(count) + file)
  allfiles.append(newfile)

count would be unique for each different file extension. You should be able to find the newly created file names in allfiles.

EDIT to use a dictionary (thanks Rory): I would suggest an alternative route. create a dictionary and use the file name as the key.

allfilesdict = {}
...
if file.endswith(extension):
  count+=1
  newfile = ('df_' + str(count) + file)
  allfilesdict[file] = newfile

then remember to return the allfilesdict if you are going to use it somewhere outside of your function.

OLIVER.KOO
  • 5,654
  • 3
  • 30
  • 62
  • Thank you OLIVER.KOO. I'm sorry if I confused things a bit here. I'm calling the function from this code: path = input('What is the folder path?\n') ext = input('What is the file extension? (i.e. .xls)\n') import DataProject as dp p = dp.FileHandler.DataFinder(path,ext) print(p) Ideally I wanted to return unique variables with strings for each file found that can be passed to pd.read_excel, read_csv etc. – Bartek Malysz Aug 05 '17 at 16:17
  • are you trying to create `df_numberOfFiles` string for each file, does the extension matter? are you concatenating `df_numberOfFiles` with the file name? can you give a simple example? – OLIVER.KOO Aug 05 '17 at 16:22
  • I think I wanted to approach it the bad way as explained by Rory. – Bartek Malysz Aug 05 '17 at 16:29
  • Rory suggested using a `dictionary` which I included in the edit. modify your `DataFinder` function to do so. – OLIVER.KOO Aug 05 '17 at 16:31
0

you can modify first script like these.

from time import gmtime, strftime

import os

def DataFinder(path, extension):

count = 0
extensions = ['.txt','.csv','.xls','xlsm','xlsx']
allfiles = []

if not extension in extensions:
    print('Can\'t read data from this file type.\n','Allowed file types are\n',str(extensions))
else:
    #loop through the files
    for root, dirs, files in os.walk(path):
        for file in files:
            #check if the file ends with the extension
            if file.endswith(extension):
                count+=1
                #taking date and time
                date_time=strftime("%Y-%m-%d %H:%M:%S", gmtime())
                #now to get file name we are splite with (.)dot so in list we get first (i.e.file_name[0]) file name and (i.e.file_name[1]) as extension.
                file_name=file.split('.')
                allfiles.append(file_name[0]+date_time+'.'+file_name[1])

    if count==0:
        print('There are no files with',extension,'extension in this folder.')
return allfiles

print DataFinder('/home/user/tmp/test','.csv')