1

I am trying to create a python function which help to do following Tasks

  • Read .csv files from a folder
  • Create different data-frame for each file with (dataframe name should be same as file name)
  • Create list of all created data frame and assign same to a variable(variable name is name of folder)

Below is the Code I am trying:

import pandas as pd
import os

def read_folder():
    path = input('Please provide path name to read:')
    for file in range(1000):
        if os.path.exists(path + '/' + str(file) + '.csv'):
            file = pd.read_csv(path + '/' + str(file) + '.csv')
            folderpath = (os.path.split(path)[1])
            temp = []
            temp.append(file)
            print(temp)
        else:
            print('No file at given location')

I have also tried different answers available in this site but somehow most of those have different goal. I am running above code for it doesn't work for me.

Did I miss something on the above code?

Tailor
  • 21
  • 5
  • " last commands exit doesn't work for me" -- what does this mean? – Scott Hunter Jul 25 '20 at 12:17
  • the whole `variable name should be like ...` is futile. Use a dictionary to store pandas dataframes under a name if you really need to get at them by name. Generally this is a sign of not-well-thought out approach, because how would your code get a name from a file to be used in your code .... - your code's `temp` is never used outside and deleted/reset to empty list inside the loop as well - you might want to go over some basic usages .... beside that you use `file` as integer and as result of pd.read_csv .. bad karma comes from reusing variable names like that. – Patrick Artner Jul 25 '20 at 12:21
  • @PatrickArtner, that Temp we can not use outside but I still get list of all created dataframe, right? – Tailor Jul 25 '20 at 12:26
  • `temp` will hold one dataframes at most ... if you do not get why, back to the basics. – Patrick Artner Jul 25 '20 at 12:27
  • @PatrickArtner, I didn't mean to hold multiple data-frame in temp variable but I just want list of name of all dataframe in that. – Tailor Jul 25 '20 at 12:32
  • @ScottHunter, I have corrected it, Sorry for creating confusion! – Tailor Jul 25 '20 at 13:04
  • if you want to keep all files in `temp` then you should create `temp = []` only once - before `for`-loop. If you create it inside `for`-loop then you remove previous content and finally you have only last file in `temp` – furas Jul 25 '20 at 13:17
  • if you want to use filenames for dataframes then better use dictionary `temp = dict()` (before `for`-loop) and add items `temp[filename] = pd.read_csv(filename)` – furas Jul 25 '20 at 13:19
  • to create useful function `read_folder()` you should rather use `input()` outside `read_folder` and run it as `read_folder(path)` - this way you can use or test it with path hardcoded or readed from file or from `sys.argv` – furas Jul 25 '20 at 13:21
  • Try this one, might work [reference](https://stackoverflow.com/questions/46950173/python-looping-through-directory-and-saving-each-file-using-filename-as-data-fr) – RakeshV Jul 25 '20 at 13:46

1 Answers1

0

if you wnat to keep all dataframes with its names then first you should create dictionary instead of list, and second you should create it before for-loop. If you create temp inside for-loop then you create it again and again and you remove previous content - so finally you have only last dataframe in temp

And when you will have dictionary with then you can get its keys to have all filenames.

BTW: It is good to use input() outside function and send path as argument - this way you can test it also with path from file or sys.argv or hardcoded name.


import pandas as pd
import os

# --- functions ---

def read_folder(path, min_number=0, max_number=1000):
    
    all_dfs = dict()
    
    for number in range(min_number, max_number):
        
        filename = f'{number}.csv'
        fullpath = os.path.join(path, filename)
        
        if os.path.exists(fullpath):
            all_dfs[filename] = pd.read_csv(fullpath)
        else:
            print('No file at given location')
    
    return all_dfs
            
# --- main ---

all_folders = dict()  # dictionary for all folders and filenames

path = input('Please provide path name to read:')

all_dfs = read_folder(path)
all_filenames = list(all_dfs.keys())

folder = os.path.split(path)[-1]
all_folders[folder] = all_filenames
furas
  • 134,197
  • 12
  • 106
  • 148