0

I have a pandas dataframe from which I wish to construct some matrices using numpy arrays. These matrices will be constructed based on variables in the dataframe, and I would like to create these via a loop over a list of the dataframe variables. I would also like the numpy arrays to be named based on the variable, so that I can easily reference them.

Below is code to try to illustrate my problem. I create a dataframe with two categorical variables and an identifier. I then create a list 'vars' with the variable names I'd like to loop over. I show that my code runs outside the loop (although the object created is pandas not numpy). The commented piece at the end does not work, but shows my attempt at including the variable string in the loop.

import pandas as pd
import numpy as np
import random

mult_cat = []   # multiple categories
bin_cat = []    # binary categories
id = []
for i in range(0,10):
    x = random.randint(0,4)
    y = random.randint(0,1)
    z = i+1
    mult_cat.append(x)
    bin_cat.append(y)
    id.append(z)

data_2 = {'ID': id,
          'mult_cat': mult_cat,
          'bin_cat': bin_cat}
df = pd.DataFrame(data_2,
                   columns = ['ID', 'mult_cat', 'bin_cat'])

vars = ['mult_cat', 'bin_cat']

twice_mult_cat=2*df.mult_cat
print(mult_cat)
print(twice_mult_cat)

"""
for var in vars:
    twice_var=2*df.var
    print(twice_var)
"""

I believe there are at least two issues here.

1) I am simply multiplying the pandas array, so the resulting object is not a numpy array.

2) The issue of naming, which is, I think, the more important issue here.

amquack
  • 837
  • 10
  • 24
  • 1
    There is a lot of confusion expressed in the wording of the question. Your quoted code has a syntax error too which would stop it working – roganjosh Jan 10 '19 at 21:13
  • 2
    `data_2 = {'ID': np.array(id), 'mult_cat': np.array(mult_cat), 'bin_cat': np.array(bin_cat)}`. No need for the dataframe, and no need to have variables floating around in the namespace. That dictionary is all you need as far as I can tell. – roganjosh Jan 10 '19 at 21:16
  • @roganjoch Yes, you are correct. I believe I fixed the syntax error you are referring to. I would like to provide clarity, but am unsure what part is unclear (but I'll try!). I wish to reference a list of strings, and use those strings to name objects created in a loop so that the objects created by the loop are differentiated (not overwritten in the next iteration of the loop) and all are accessible to be used in future operations after the loop has run. – amquack Jan 10 '19 at 21:21
  • 2
    But this is not a good idea. Keep them in a dictionary and refer to them by the name of the key. Otherwise this is a duplicate of "a variable number of variable" https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables – roganjosh Jan 10 '19 at 21:22

0 Answers0