I have a pandas dataframe from which I wish to construct some matrices using numpy arrays. These matrices will be constructed based on variables in the dataframe, and I would like to create these via a loop over a list of the dataframe variables. I would also like the numpy arrays to be named based on the variable, so that I can easily reference them.
Below is code to try to illustrate my problem. I create a dataframe with two categorical variables and an identifier. I then create a list 'vars' with the variable names I'd like to loop over. I show that my code runs outside the loop (although the object created is pandas not numpy). The commented piece at the end does not work, but shows my attempt at including the variable string in the loop.
import pandas as pd
import numpy as np
import random
mult_cat = [] # multiple categories
bin_cat = [] # binary categories
id = []
for i in range(0,10):
x = random.randint(0,4)
y = random.randint(0,1)
z = i+1
mult_cat.append(x)
bin_cat.append(y)
id.append(z)
data_2 = {'ID': id,
'mult_cat': mult_cat,
'bin_cat': bin_cat}
df = pd.DataFrame(data_2,
columns = ['ID', 'mult_cat', 'bin_cat'])
vars = ['mult_cat', 'bin_cat']
twice_mult_cat=2*df.mult_cat
print(mult_cat)
print(twice_mult_cat)
"""
for var in vars:
twice_var=2*df.var
print(twice_var)
"""
I believe there are at least two issues here.
1) I am simply multiplying the pandas array, so the resulting object is not a numpy array.
2) The issue of naming, which is, I think, the more important issue here.