0

Python "Exec" command is not passing local values in exec shell. I thought this should be a simple question but all seem stumped. Here is a repeatable working version of the problem ... it took me a bit to recreate a working problem (my files are much larger than examples shown here, there are up to 10-dfs per loop, often 1800 items per df )

EXEC was only passing "PRODUCT" (as opposed to "PRODUCT.AREA" before I added "["{ind_id}"]" and then also it also shows an error "<string> in <module>".

datum_0 = {'Products':  ['Stocks', 'Bonds', 'Notes'],'PRODUCT.AREA': ['10200', '50291','50988']}
df_0 = pd.DataFrame (datum_0, columns = ['Products','PRODUCT.AREA'])
datum_1 = {'Products':  ['Stocks', 'Bonds', 'Notes'],'PRODUCT.CODE': ['66', '55','22']}
df_1 = pd.DataFrame (datum_1, columns = ['Products','PRODUCT.CODE'])
df_0

enter image description here

enter image description here

summary = {'Prodinfo':  ['PRODUCT.AREA', 'PRODUCT.CODE']}
df_list= pd.DataFrame (summary, columns = ['Prodinfo'])
df_list

enter image description here

# Create a rankings column for the Prodinfo tables
for rows in df_list.itertuples():
    row = rows.Index
    ind_id = df_list.loc[row]['Prodinfo']
    print(row, ind_id)
    exec(f'df_{row}["rank"] = df_{row}["{ind_id}"].rank(ascending=True) ')

Of course its this last line that is throwing exec errors. Any ideas? Have you got a working global or local variable assignment that fixes it? etc... thanks!

Edward
  • 179
  • 2
  • 12
  • 1
    maybe first use `print()` to see what string you created - I think you create wrong string - you may need `"` in `["{ind_id}"]` – furas Jun 11 '20 at 23:43
  • 4
    why do you use `exec` for this ? Can't you do this normally ? – furas Jun 11 '20 at 23:45
  • are you sure you need `data{row_next}` which creates `data1`, `data2` ? Maybe you means `data.iloc(row_next)` - and this doesn't need `exec` ? OR maybe you should keep `data1`, `data2` as list `data[1]`, `data[2]` and then you can do `data[row_next]` - and this doesn't need `exec` – furas Jun 11 '20 at 23:55
  • 2
    I have to agree that `exec` is the wrong tool for this job. Provide a standalone [mre] (meaning the shortest possible example complete enough to fail with your real error, not one about undefined variables or such, when copied and pasted) and we'll be able to test and provide a correct one. – Charles Duffy Jun 12 '20 at 01:18
  • @furas I don't show it, but I use data{row_next} as well at the start of this line of code - inside a loop that creates a variable-sized index from several indicators. – Edward Jun 12 '20 at 01:58
  • 4
    I don't understand why you use `data{row_next} ` but probably you could do the same using `list` or `dict` - `data[row_next]` - without `exec()` – furas Jun 12 '20 at 02:27
  • 2
    Using `exec` here makes no sense. It is almost never the right solution to begin with – juanpa.arrivillaga Jun 12 '20 at 02:45
  • 1
    In any case "straight ng variables" (no such thing) are passed like *any other object*. Clearly, there is some assumption you are making which is wrong.without a [mcve] it's impossible to help – juanpa.arrivillaga Jun 12 '20 at 02:47
  • @juanpa.arrivillaga EXEC Problem is restated with a working example ...thanks for taking a look – Edward Jun 12 '20 at 03:34
  • @CharlesDuffy Exec problem is restated with a working example ... – Edward Jun 12 '20 at 03:35
  • @furas I saw that dict and list was an option as well, this looked like the easiest, but it didn't want to work and I didn't realize there should be a bias against this approach. – Edward Jun 12 '20 at 03:57
  • 1
    To explain that bias -- usually, the syntax of a piece of code puts boundaries on its possible meanings and behaviors. As soon as you transform data into additional syntax, that's no longer true; one can't look just at the syntax but needs to analyze in the context of all possible data values. This also defeats static checking tools (which generally can't predict which values are possible ahead-of-time) and introduces potential for security bugs, if a user can influence that data in unexpected ways. – Charles Duffy Jun 12 '20 at 13:03
  • 1
    Thank you for the edits -- this is a much better question now! – Charles Duffy Jun 12 '20 at 17:04

2 Answers2

2

I would use list to keep all DataFrames

all_df = [] # list
all_df.append(df_1)
all_df.append(df_2)

and then I would no need exec

for rows in df_list.itertuples():
    row = rows.Index
    ind_id = df_list.loc[row]['Prodinfo']
    print(row, ind_id)
    all_df[row]["rank"] = all_df[row][ind_id].rank(ascending=True)

Eventually I would use dictionary

all_df = {} # dict
all_df['PRODUCT.AREA'] = df_1
all_df['PRODUCT.CODE'] = df_2

and then I don't need exec and df_list

for key, df in all_df.items():
    df["rank"] = df[key].rank(ascending=True)

Minimal working code with list

import pandas as pd

all_df = [] # list

datum = {
    'Products': ['Stocks', 'Bonds', 'Notes'],
    'PRODUCT.AREA': ['10200', '50291', '50988']
}
all_df.append( pd.DataFrame(datum) )

datum = {
    'Products': ['Stocks', 'Bonds', 'Notes'],
    'PRODUCT.CODE': ['66', '55', '22']
}
all_df.append( pd.DataFrame(datum) )

#print( all_df[0] )
#print( all_df[1] )

print('--- before ---')
for df in all_df:
    print(df)

summary = {'Prodinfo': ['PRODUCT.AREA', 'PRODUCT.CODE']}
df_list = pd.DataFrame(summary, columns=['Prodinfo'])
#print(df_list)

for rows in df_list.itertuples():
    row = rows.Index
    ind_id = df_list.loc[row]['Prodinfo']
    #print(row, ind_id)
    all_df[row]["rank"] = all_df[row][ind_id].rank(ascending=True)

print('--- after ---')
for df in all_df:
    print(df)

Minimal working code with dict

import pandas as pd

all_df = {} # dict

datum = {
    'Products': ['Stocks', 'Bonds', 'Notes'],
    'PRODUCT.AREA': ['10200', '50291', '50988']
}
all_df['PRODUCT.AREA'] = pd.DataFrame(datum)


datum = {
    'Products': ['Stocks', 'Bonds', 'Notes'],
    'PRODUCT.CODE': ['66', '55', '22']
}
all_df['PRODUCT.CODE'] = pd.DataFrame (datum)

print('--- before ---')
for df in all_df.values():
    print(df)

for key, df in all_df.items():
    df["rank"] = df[key].rank(ascending=True)

print('--- after ---')
for df in all_df.values():
    print(df)

Frankly, for two dataframes I wouldn't waste time for df_list and for-loop

import pandas as pd

datum = {
    'Products': ['Stocks', 'Bonds', 'Notes'],
    'PRODUCT.AREA': ['10200', '50291', '50988']
}
df_0 = pd.DataFrame(datum)

datum = {
    'Products': ['Stocks', 'Bonds', 'Notes'],
    'PRODUCT.CODE': ['66', '55', '22']
}
df_1 = pd.DataFrame(datum)

print('--- before ---')
print( df_0 )
print( df_1 )

df_0["rank"] = df_0['PRODUCT.AREA'].rank(ascending=True)    
df_1["rank"] = df_1['PRODUCT.CODE'].rank(ascending=True)    

print('--- after ---')
print( df_0 )
print( df_1 )

And probably I would even put all in one dataframe

import pandas as pd

df = pd.DataFrame({
    'Products': ['Stocks', 'Bonds', 'Notes'],
    'PRODUCT.AREA': ['10200', '50291', '50988'],
    'PRODUCT.CODE': ['66', '55', '22'],
})

print('--- before ---')
print( df )

#df["rank PRODUCT.AREA"] = df['PRODUCT.AREA'].rank(ascending=True)    
#df["rank PRODUCT.CODE"] = df['PRODUCT.CODE'].rank(ascending=True)    

for name in ['PRODUCT.AREA', 'PRODUCT.CODE']:
    df[f"rank {name}"] = df[name].rank(ascending=True)    

print('--- after ---')
print( df )

Result:

--- before ---
  Products PRODUCT.AREA PRODUCT.CODE
0   Stocks        10200           66
1    Bonds        50291           55
2    Notes        50988           22
--- after ---
  Products PRODUCT.AREA PRODUCT.CODE  rank PRODUCT.AREA  rank PRODUCT.CODE
0   Stocks        10200           66                1.0                3.0
1    Bonds        50291           55                2.0                2.0
2    Notes        50988           22                3.0                1.0
furas
  • 134,197
  • 12
  • 106
  • 148
  • For reasons not obvious in my example, grouping the dataframes into one is problematic. With itertuples and vectoring, the solution of keeping dfs separate is lightning-quick as well. In the data, there are many nulls, many external querying conditions, normalizing fields in addition to rank, in short - separate is a necessity initially just so I can keep it straight in my head. – Edward Jun 12 '20 at 05:17
1

As expected, this was an easy fix. Thanks to answerers who gave much to think about ...

Kudos to @holdenweb and his answer at ... Create multiple dataframes in loop

dfnew = {}  # CREATE A DICTIONARY!!! - THIS WAS THE TRICK I WAS MISSING
df_ = {}
for rows in df_list.itertuples():
    row = rows.Index
    ind_id = df_list.loc[row]['Prodinfo']
    dfnew[row] = df_[row] # or pd.read_csv(csv_file) or database_query or ...
    dfnew[row].dropna(inplace=True)
    dfnew[row]["rank"] = dfnew[row][ind_id].rank(ascending=True)

Works well and very simple...

Edward
  • 179
  • 2
  • 12