0

I have a bunch of csv files in a folder that I have saved into a dictionary so that each entry is its own dataframe. Each dataframe is a few columns of time series data.


for df in csv_files:
  
  df_name = df
  filename = "{}".format(df)
  #print(filename)
  dfs[df_name] = pd.read_csv(filename, sep = ',', skiprows= 13)
  dfs[df_name].loc[0] = 0 # set first line to zeros

All of the dictionary keys right now are the name of the data file so if I want to look at just one dataframe I have to type in the whole path.

dfs['/content/drive/My Drive/LAICPMS 72123/ANP/ANP 072123_7.csv']

But this isn't very convenient. I'm able to change each key manually as follows

dfs['7'] = dfs.pop('/content/drive/My Drive/LAICPMS 72123/ANP/ANP 072123_7.csv')

but I'd really like to use a loop to rename each key as a string integer.

I tried to make a list of string integers by counting the number of files in the folder

#how many files are in folder
count = 0
for path in listdir("/content/drive/My Drive/LAICPMS 72123/ANP"):
  if isfile(join("/content/drive/My Drive/LAICPMS 72123/"ANP, path)):
    count+=1
#print(count)

#make a list of integer as long as number of files
num_files = list(range(count))


#convert integers to strings
list_st = map(str, num_files)
#print(list(list_st))

for x in list_st:
  print(x)

which successfully counts them and makes a new list, but then I'm lost on how to set up a loop to rename the dictionary keys with this list. Clearly, I'm not understanding something about indexing lists and dictionaries.

  • "but then I'm lost on how to set up a loop to rename the dictionary keys with this list." - what should happen each time through the loop? (Perhaps, storing a key/value pair into a new dict?) What value do you want to process each time through the loop? (Perhaps, a dict key?) What is the **rule that tells you** how the integers you want to use as keys correspond to the filenames? If you want to process the filename to get a number out of it, fine; first write that logic and make a function for it. If you just want to use ascending numbers as keys - why not just use a list? – Karl Knechtel Jul 24 '23 at 20:20

2 Answers2

0

If you have the dictionary, then you already have the number, no need to count.

You could create another dictionary like so:

dfs_new = dict(enumerate(dfs.values()))

But in the above case, you probably just want a list, in which case, all you need is:

dfs_new = list(df.values())

Not sure why you want the keys to be strings, but if they must, you could do:

dfs_new = {str(i):v for i,v in enumerate(dfs.values())}

In general, if you have some set of new_keys that correspond (in order) to the keys that must be replaced, you can do:

dict_new = dict(zip(new_keys, some_dict.values())

You can always re-assign to the same variable if you don't need the original dict anymore. This is practically always the best approach. If you must modify the dictionary in-place (you probably shouldn't have to in a well-designed program, but it happens...), you could do the following, assuming data is the dictionary:

for (original_key, value), new_key in zip(list(data.items()), new_keys):
    data[new_key] = value
    del data[original_key]
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
0

You can iterate through dictionary keys and number then at the same time using enumerate. However, you are going to change the dictionary inside the loop, so fix the keys list before starting to loop list(dfs.keys()), otherwise something strange can happen.

You may also want to save the mapping from index to path for debugging. I created index_2_path dictionary to do so:

dfs = {"path1": "df1", "path2": "df2", "path3": "df3"}
index_2_path = {}

for i, path in enumerate(list(dfs.keys())):
    dfs[str(i)] = dfs.pop(path)
    index_2_path[str(i)] = path

print("dfs:", dfs)
print("index_2_path:", index_2_path)

Output:

dfs: {'0': 'df1', '1': 'df2', '2': 'df3'}
index_2_path: {'0': 'path1', '1': 'path2', '2': 'path3'}

You can also use any predefined or generated list for renaming, not only numbers:

dfs = {"path1": "df1", "path2": "df2", "path3": "df3"}
new_keys = ["a", "b", "c"]
index_2_path = {}

for new_name, path in zip(new_keys, list(dfs.keys())):
    dfs[new_name] = dfs.pop(path)
    index_2_path[new_name] = path

print("dfs:", dfs)
print("index_2_path:", index_2_path)

Output:

dfs: {'a': 'df1', 'b': 'df2', 'c': 'df3'}
index_2_path: {'a': 'path1', 'b': 'path2', 'c': 'path3'}
Maria K
  • 1,491
  • 1
  • 3
  • 14