2

I have run a python script that created multiple variables. Now I want to iterate over a few dataframes (created by the script) matching a specific pattern and perform simple operations on them. Initially I want to get the number of rows (with shape()) of each of the dataframes in list_dfs, as shown below:

['FAILEDRuns_0112',
 'FAILEDRuns_0121',
 'FAILEDRuns_0126',
 'FAILEDRuns_0129',
 'FAILEDRuns_0131',
 'FAILEDRuns_0134',
 'FAILEDRuns_0135',
 'FAILEDRuns_0137',
 'FAILEDRuns_0142',
 'FAILEDRuns_0153',
 'FAILEDRuns_0165',
 'FAILEDRuns_0171',
 'FAILEDRuns_0175']

In fact if I do:

for i in list(filter(failed_runs_finder.findall, dir())):
    print(locals()[i].shape[0])

I get the number of rows printed onto the screen:

1
0
0
0
1
0
0
0
0
0
0
0
0

Which contains the information that I need, though not in the format that I want. Eventually what I need to know is the number of 1's and the number of zero's, so I thought about getting a list comprehension, to eventually compare the total sum (i.e. the number of 1's) with the length of the list i.e. the total number of elements.

However, if I do:

[locals()[i].shape[0] for i in list_dfs]

I get the following error:

KeyError: 'FAILEDRuns_0112'

I don't quite understand where the error is coming from. As far as I see, it is not in terms of syntax of list comprehensions.

Does it have anything to do with using locals() within a list comprehension?

My second option would be to build a df iteratively and get the sum, though I think it is simpler with list comprehension and I don't quite get where the error is coming from.

BCArg
  • 2,094
  • 2
  • 19
  • 37
  • wouldn't it be easier to use `sum` for the number of ones and `len-sum` for the number of zeros? – Nullman Feb 21 '19 at 15:17
  • In the for loop you did `print(locals()[i].shape)`, why did it become `shape[0]` in the list comprehension. – Rocky Li Feb 21 '19 at 15:18
  • 1
    In Python3, [list comprehensions have their own scope](https://stackoverflow.com/q/13905741/190597). – unutbu Feb 21 '19 at 15:18
  • @Rocky Li: true that the code does not match the output, though the whole point is that for me it is odd that I can do within a `for` loop, but not with list comprehension. – BCArg Feb 21 '19 at 15:20
  • 1
    Why do you have so many similarly named local variables instead of a single `dict`? – chepner Feb 21 '19 at 15:30
  • What @chepner said, you should consider using a `dict` to host all the names and `DataFrame` object instead of relying on `locals()` to reference back to the frames. If you could update the script that generates these local names into a single `dict` it would make your life *much* easier. – r.ook Feb 21 '19 at 15:54
  • @Idlehands: yes I have noticed that after I was halfway through my script, so I decided to stick to the locals until the end – BCArg Feb 21 '19 at 15:57

1 Answers1

0

Try this instead if you really must rely on locals():

[v.shape[0] for k, v in locals().items() if k in list_dfs]

However it would probably be a better approach, as suggested, to use a single dict to store all the names and DataFrame objects instead.

If you want to get the counts of rows:

from collections import Counter

cnt = Counter(v.shape[0] for k, v in locals().items() if k in list_dfs)

cnt[1]
# 2

cnt[0]    
# 11
r.ook
  • 13,466
  • 2
  • 22
  • 39
  • It appears to be common sense to use `dict` instead of `locals()` or `globals()`. My script is cding into multiple directories and parsing information from different files into dataframes. Back then when I started writing the script I found that using `globals()` combined with f-strings was a solution to obtain multiple (similar) dataframes with a common name (FAILEDRuns_) but with unique identifiers (0112, 0121 etc.). This also has the advantage that I can always visually inspect the dfs on `spyder`. Will definitely have a look on how to build a `dict` iteratively. – BCArg Feb 22 '19 at 08:19
  • 1
    Building the `dict` of `DataFrames` will definitely be easier to reference down the road. If you need to rely on `globals()` or `locals()` to dynamically refer back to a variable you'll quickly run into scoping problem like this, where as with `dict` you always have the direct reference point across different scopes. – r.ook Feb 22 '19 at 14:34