0

I have numerous dataframes and each dataframe has about 100 different chemical compounds and a categorical variable listing the type of material. For example, a smaller version of my datasets would look something like this:

Decane    Octanal    Material
 1         20         Water
 2         1          Glass
 10        5          Glass
 9         4          Water

I am using a linear regression model to regress the chemicals onto the material type. I want to be able to dynamically rename the results dataframe based on which dataset I am using. My code looks like this (where 'feature_cols' are the names of the chemicals):

count=0
dataframe=[]

#loop through the three datasets (In reality I have many more than three)
for dataset in [first, second, third]:
count+=1


for feature in feature_cols:

    #define the model and fit it
    mod = smf.ols(formula='Q(feature)'+'~material', data=dataset)
    res = mod.fit()
    
    #create a dataframe of the pvalues
    #I would like to be able to dynamically name pvalues so that when looping through
    #the chemicals of the first dataframe it is called 'pvalues_first' and so on.

    pvalues=pd.DataFrame(res.pvalues)
    
Niam45
  • 552
  • 2
  • 16
  • Does this answer your question? [How do I create variable variables?](https://stackoverflow.com/questions/1373164/how-do-i-create-variable-variables) – matszwecja Aug 10 '22 at 08:51

2 Answers2

1

You can use a dictionary (here with dummy values) :

names = ['first', 'second', 'third', 'fourth', 'fifth', 'sixth']
pvalues = {}
for i in range(len(names)):
    pvalues["pvalues_" + names[i]] = i+1

print(pvalues)

Output:

{'pvalues_first': 1, 'pvalues_second': 2, 'pvalues_third': 3, 'pvalues_fourth': 4, 'pvalues_fifth': 5, 'pvalues_sixth': 6}

To access pvalues_third for example :

pvalues["pvalues_third"] = 20
print(pvalues)

**Output: **

{'pvalues_first': 1, 'pvalues_second': 2, 'pvalues_third': 20, 'pvalues_fourth': 4, 'pvalues_fifth': 5, 'pvalues_sixth': 6}
rochard4u
  • 629
  • 3
  • 17
  • if pvalues is a dictionary then running this line of code: pvalues["pvalues_" + dataset[count]] = pd.DataFrame(res.pvalues) gives key error 1 – Niam45 Aug 10 '22 at 09:26
  • Your names variable is a list of strings but my equivalent dataset variable is a list of dataframes – Niam45 Aug 10 '22 at 09:37
  • In my code above, 'dataset' is what I am looping through to go through my dataframes. – Niam45 Aug 10 '22 at 09:48
  • If I run your first code snippet with names being a list of dataframes instead of just a list of strings, it gives me an error. Your code works fine when your looping through strings but not when you are looping through dataframes – Niam45 Aug 10 '22 at 09:50
  • first=pd.read_excel('file_path') – Niam45 Aug 10 '22 at 09:54
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/247174/discussion-between-rochard4u-and-niam45). – rochard4u Aug 10 '22 at 09:59
0
count=0
dataframe=[]

#loop through the three datasets (In reality I have many more than three)
names = ["first", "second", "third"]
for feature in feature_cols:
    #define the model and fit it
    mod = smf.ols(formula='Q(feature)'+'~material', data=dataset)
    res = mod.fit()

    #create a dataframe of the pvalues
    #I would like to be able to dynamically name pvalues so that when looping through
    #the chemicals of the first dataframe it is called 'pvalues_first' and so on.
    name_str = "pvalues"+str(names[count])
    pvalues = {'Intercept':[res.pvalues[0]], 'cap_type':[res.pvalues[1]]}
    name_str=pd.DataFrame(pvalues)
    count+=1
comk
  • 1
  • 2