1

I have the following code for creating a dictionary of data frames using csv files:

l = ['employees','positions']
d = {}
for x in l:
    d[x] = pd.read_csv("P:\\python_work\\data_sets\\" + x + ".csv")

How would I do the same using a list of data frames that already exist in memory?

This doesn't work but maybe it helps clarify what I'm trying to do:

l = ['df1','df2']
d = {}
for x in l:
    d[x] = x

I would then be able to access individual data frames like so:

d['df1']

I provided the example using csv files because it works and it has the same end result (a dictionary of data frames).

Here's an example of the desired contents of the dictionary:

{'employees':    id   name      date
 0   1    bob  1/1/2018
 1   2  sally  1/2/2018, 'positions':      pos      desc status
 0  12454  director      a
 1  65444   manager      i}

I want to use a list of existing data frames rather than csv files. I tried using a list without quotes but I get an error:

l = [employees, positions]
d = {}
for x in l:
    d[x] = x

...but I get this error:

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
Dread
  • 789
  • 3
  • 16
  • 31

3 Answers3

1

The problem is you're defining a list of strings and building a dictionary mapping each string to itself. Much simpler is to use enumerate with an iterable of dataframes. Assuming df1 and df2 are dataframes:

d = dict(enumerate((df1, df2), 1))

Then access your dataframes via d[1] and d[2]. If you really want your keys to be strings "df1" and "df2", you can use a dictionary comprehension:

d = {'df'+str(i): j for i, j in enumerate((df1, df2), 1)}

A better naming convention, in my opinion, is to use your filenames as keys:

files = ['employees', 'positions']
d = {f: pd.read_csv(f'P:\\python_work\\data_sets\\{f}.csv') for f in files}
jpp
  • 159,742
  • 34
  • 281
  • 339
  • I want to use the option similar to where you use filenames as keys, but I want to use the data frame names as keys (i.e. employees and positions are existing data frames rather than csv files). – Dread Jul 03 '18 at 15:05
  • @Dread, So the example in your question isn't really accurate? Here are the things you should **not** do: use `eval`, use `globals`, use `locals`. What you *can* do is read dataframes straight into your dictionary, e.g. `d = {}; d['employees'] = pd.read_csv(...)`. – jpp Jul 03 '18 at 15:06
0

You are almost there, I added k to show how you should use enumerate in this case

l = ['employees','positions']
k = [1,2]
d = {}
for index,x in enumerate(l):
    d[x] = k[index]

Returns for d:

{'employees': 1, 'positions': 2}

Than excess you dataframe by:

df_1 = d.get('employees')

(ofcourse you have to replace k[index] with the creation of your dataframe)

XanderMJ
  • 148
  • 1
  • 8
-1

There is already a dictionary with all the declared variables in memory available via the locals() or the globals() builtin functions, depending whether the dataframes are defined as local or global variables. You should be able to access your DataFrame as such:

locals()['df1']
Rob
  • 3,418
  • 1
  • 19
  • 27
  • 1
    In my opinion, using `globals()` for this purpose is not recommended, see https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables – jpp Jul 03 '18 at 14:55