0

I have a directory containing many .dta (Stata format) files which I have loaded into a dictionary of dataframes.

Given I now may have many dataframes within the unified dictionary object, how can I select a few and "extract" them out of the dictionary?

So for example, how would I extract just two of these three dataframes nested in the dictionary, something like

dat10 = df_dict['dat10']
dat11 = df_dict['dat11']

but without using literal assignments. Maybe some sort of looping structure over the dictionary.

idg23
  • 1
  • 1
  • Welcome to stack overflow. It's not entirely clear what your expected output is, are you just asking ow to iterate over a dictionary? – G. Anderson Sep 13 '21 at 20:12
  • Does this answer your question? [Iterate through dictionary values?](https://stackoverflow.com/questions/30446449/iterate-through-dictionary-values) – G. Anderson Sep 13 '21 at 20:12
  • Hi @G.Anderson thank you for responding. My expected output is standalone dataframes i.e., dat10, dat11,... so that I can use them later. So I already have these dataframes in the dictionary and I was wondering if I can extract them from the dictionary. For better context, these were all .dta files and I have imported them in Python using a dictionary. Let me know if that helps. Thanks again! – idg23 Sep 13 '21 at 20:23
  • I think I understand a little better now. As you can see in [How can I create variable variables](https://stackoverflow.com/questions/1373164/how-do-i-create-variable-variables) the generally accepted best practice is, in fact, to store them in a dictionary, since python doesn't innately allow you to create variables dynamically. What you have now is a perfectly good way of storing them, which can be accessed and operated on via key:value pairs. What is it you've tried and found that you _can't_ do with the structure the way it is now? – G. Anderson Sep 13 '21 at 20:45
  • Thanks again! So if I have access to the individual dataframes, then I can do some further data cleaning using a for loop. Let's assume (for the sake of simplicity) I want to a create in all of them a new variable and drop some other variables. If I have these dataframes, then I can simply run a for loop like: for df in (df9, df10, df11): df['new_var'] = 1 / df['old_var'] + 1 df = df.drop(['va1', 'var2', axis = 1, inplace=True) – idg23 Sep 13 '21 at 20:51
  • I would counter that by saying you could just as easily say `for df in (df9, df10, df11): df_dict[df['new_var']] = 1 / df_dict[df['old_var']] + 1 df_dict[df] = df_dict[df].drop(['va1', 'var2', axis = 1, inplace=True)` – G. Anderson Sep 13 '21 at 21:01
  • Thanks again! Unfortunately if I do as you suggested, I get an error: TypeError: unhashable type: 'Series' – idg23 Sep 13 '21 at 21:28

0 Answers0