0

I would like to know how to name in a different way the data frames that I am going to create using the code below.

import pandas as pd
import glob

os.chdir("/Users/path")

dataframes=[]

paths =  glob.glob("*.csv")

for path in paths:
        dataset= pd.read_csv(path)
    dataframes.append(dataset)

I would like to have something like this:

dataset_1
dataset_2
.... 

in order to use each of them for different analysis purposes. Could you please tell me how to do this or just suggesting me any other post related to my question (then closing mine, if duplicate)?

Thank you

  • 2
    You can use `Dictionary` if the name of each dataset is unique – DavidDr90 May 07 '20 at 18:50
  • **Don't** dynamically create variables, use a *container* like a list or a dict. – juanpa.arrivillaga May 07 '20 at 19:07
  • Thank you so much for all your comments and answers. Paths is a list that includes very long names, so I would prefer to use a suffix for the dataframe like 1,2,3... (the length of this list of numbers should be the same as for paths). An example of path's name is `example_20_05_24_test.csv` (but I have different names for each dataset). This is the reason why I am looking for something that can just call my datasets as dataset_1 for path 1, dataset_2 for path 2 ... and so on, until I have no more dataset left –  May 07 '20 at 19:43
  • dataset is a dataframe in my code. I think I would need an inner loop that can iterate through the datasets and assign to the dataframe a name like dataset_1, dataset_2... (or just df1, df2,df3 ... would be the same) –  May 07 '20 at 19:48

3 Answers3

0

Elaborating on @DavidDr90's answer, a python dictionary allows you to a unique identifier to identify each dataset (could be their filename)

import os
import pandas as pd
import glob

os.chdir("/Users/path")

paths =  glob.glob("*.csv")

datasets = {}  # Initialise the dictionary

for path in paths:
    filename = os.path.splitext(os.path.basename(path))[0]
    dataset = pd.read_csv(path)
    datasets[filename] = dataset

This creates a Dictionary called datasets and uses the filenames as unique keys.

kwsp
  • 1,184
  • 7
  • 26
0

If all your datasets' names are unique you can use Dictionary for that, for example:

dataframes = dict()  # init new dict object
for path in paths:
    dataset = pd.read_csv(path)
    dataframes[<your unique name>] = dataset  # this will create new key-value pair in the dictionary

If your using non-unique names you can use a list of tuples, for exmaple:

dataframes = []
for path in paths:
    dataset = pd.read_csv(path)
    dataframes.append((<your dataset name>, dataset))  # please note the comma
DavidDr90
  • 559
  • 5
  • 20
  • Hi @DavidDr90, may I ask you how I should add in `` field? What I would like is to create a dataframe for each csv file/dataset, calling by numbers (like `df1,df2`,...). This list of numbers should be equal to the length(numbers) of datasets (csv files), i.e. `len(dataframes)` –  May 07 '20 at 20:04
  • Hi @LucaDiMauro, you can simply do that: `csv_length = iter(range(len(path)))` to get the number of `csv`s in a `iter` object. Then you can create unique name like this: `"df{}".format(next(csv_length))` – DavidDr90 May 08 '20 at 12:19
0

Setting the name of dataframe will do the job.

import pandas as pd
import glob
import os

os.chdir("/Users/path")

dataframes=[]

paths =  glob.glob("*.csv")

for path in paths:
      dataset= pd.read_csv(path)
      dataset.name=path
      dataframes.append(path)
Manoj Khatua
  • 123
  • 8