Python, loops with changeable parts of filenames

Question

I have a bunch of very similar commands which all look like this (df means pandas dataframe):

df1_part1=...
df1_part2=...
...
df1_part5=...
df2_part1=...

I would like to make a loop for it, as follows:

for i in range(1,5):
for j in range(1,5):
df%i_part%j=...

Of course, it doesn't work with %. But is has to be some easy way to do it, I suppose. Could You help me please?

Assign dataframe to a dict instead of assigning to variable. — sushanth, May 29 '20 at 12:27

Gabio · Accepted Answer · 2020-05-29T12:31:26.513

1

You can try one of the following options:

Create a dictionary which maps the your df and access it by the name of the dataframe:

mapping = {"df1_part1": df1_part1, "df1_part2": df1_part2}
for i in range(1,5):
    for j in range(1,5):
        mapping[f"df{i}_part{j}"] = ...

Use globals to access dynamically your variables:

df1_part1=...
df1_part2=...
...
df1_part5=...
df2_part1=...

for i in range(1,5):
    for j in range(1,5):
        globals()[f"df{i}_part{j}"] = ...

edited May 29 '20 at 12:31

answered May 29 '20 at 12:28

Gabio

9,126
3
12
32

3

avoid using global variables when un-necessary, https://stackoverflow.com/questions/17874946/is-there-any-disadvantage-to-declare-a-variable-global – sushanth May 29 '20 at 12:32
What is an alternative to global variables in my case? I cannot define a list of files, it would be to long. I not only have files with numbers in it on the left side, there are much more on the right side. – Logic_Problem_42 May 29 '20 at 12:40
I still think that it is strange that there is no simple solution. In SAS it is enought to write df%i_part%j=... and everything is fine. Macro variables make it possible. – Logic_Problem_42 May 29 '20 at 12:46
1

There is a simple solution: use an appropriate data structure like a dictionary rather than digging in the innards of variable names with `globals`. – Matthias May 29 '20 at 12:49
I have commands like verb_kd_top3_z =top3[top3.kd.isin(top3_zl_kd)==True] and I dont' want to manually create lists or dictionaries of all of the files. – Logic_Problem_42 May 29 '20 at 12:52

score 0 · Answer 2 · answered May 29 '20 at 12:30

0

One way would be to collect your pandas dataframes in a list of lists and iterate over that list instead of trying dynamically parse your python code.

df1_part1=...
df1_part2=...
...
df1_part5=...
df2_part1=...

dflist = [[df1_part1, df1_part2, df1_part3, df1_part4, df1_part5],
          [df2_part1, df2_part2, df2_part3, df2_part4, df2_part5]]
for df in dflist:
    for df_part in df:
        # do something with df_part

answered May 29 '20 at 12:30

NewPythonUser

361
1
9

This solution would not help much. Actually I have in each command many places where the same number is a part of some file. The list of lists would be very long. – Logic_Problem_42 May 29 '20 at 12:38

Commissar Vasili Karlovic · Answer 3 · 2020-05-29T13:29:53.510

Assuming that this process is part of data preparation, I would like to mention that you should try to work with "data preparation pipelines" whenever it is possible. Otherwise, the code will be a huge mess to read after a couple of months.

There are several ways to deal with this problem.

A dictionary is the most straightforward way to deal with this.

df_parts = {
            'df1' : {'part1': df1_part1, 'part2': df1_part2,...,'partN': df1_partN},
            'df2' : {'part1': df1_part1, 'part2': df1_part2,...,'partN': df2_partN},
            '...' : {'part1': ..._part1, 'part2': ..._part2,...,'partN': ..._partN},
            'dfN' : {'part1': dfN_part1, 'part2': dfN_part2,...,'partN': dfN_partN},
           }

# print parts from `dfN`
for val in for df_parts['dfN'].values():
    print(val)

# print part1 for all dfs
for df in df_parts.values():
    print(df['part1'])

# print everything
for df in df_parts:
    for val in df_parts[df].values():
        print(val)

The good thing with this approach is that you can iterate through the whole dictionary, but you don't include range which may be confusing later. Also, it is better to assign every df_part directly to a dict instead of assigning N*N variables which may be used once or twice. In this case you can just use 1 variable and re-assign it as you progress:

# code using df1_partN
df1 = df_parts['df1']['partN']
# stuff to do
# happy? checkpoint
df_parts['df1']['partN'] = df1

It's a lot to code, so I don't save any time at the end. The problem is that I have 5 different types of file names with numbers in it, not only df... So I need 5 dictionaries and then I have to rewrite all commands. It is easier simply to copy and paste and change the numbers manually. I can do this faster. — Logic_Problem_42, May 29 '20 at 13:30
I can agree that the problem is the "bad" data preparation, but it was not my idea to do it this way. And still I have to work with it, I am not authorized to change the name conventions. — Logic_Problem_42, May 29 '20 at 13:32

Python, loops with changeable parts of filenames

3 Answers3