2

I want to divide DataFrame based on the different categorical values of a column(Q14) and name the variables separately for the resulted DataFrame. data_int.Q14 has 4 unique values (2,3,4,5). How can I create separate string variable names for DataFrames using a for loop? Here is the image of main DataFrame (data_int)

fleet_type = data_int.Q14.unique()
for i in data_int.Q14:
  for uni in fleet_type:
    if i == uni:
      data_'{}'.format{uni} = data_int #I tried to assign the unique values to identify the DataFrames uniquely. 

File "<ipython-input-25-2200e7c4c3b7>", line 5
    data_'{}'.format{uni} = data_int
            ^
SyntaxError: invalid syntax

Ideally, I want to use list comprehension for this particular case like below,

[data_int for i in data_int.Q14 if i == 2]

but I am not able to define the name of the DataFrame variables.

Ultimately, new DataFrame should be named as as follows,

fleet_data_list = ['fleet_type_{}'.format(i) for i in data_int.Q14.unique()]
fleet_data_list
  • fleet_type_2 = (new_dataframe)
  • fleet_type_3 = (new_dataframe)
  • fleet_type_4 = (new_dataframe)
  • fleet_type_5 = (new_dataframe)

I couldn't find a way to use fleet_data_list to define the variable. Any idea how can I do this?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
sargupta
  • 953
  • 13
  • 25

1 Answers1

1

I think here is best create dictionary of DataFrames by convert groupby object to tuples and then to dicts:

df = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'Q14':[4,3,2,2,4,5],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

dfs = dict(tuple(df.groupby('Q14')))
print (dfs)
{2:    A  B  C  Q14  E  F
2  c  4  9    2  6  a
3  d  5  4    2  9  b, 3:    A  B  C  Q14  E  F
1  b  5  8    3  3  a, 4:    A  B  C  Q14  E  F
0  a  4  7    4  5  a
4  e  5  2    4  2  b, 5:    A  B  C  Q14  E  F
5  f  4  3    5  4  b}

And select by keys:

print (dfs[2])
   A  B  C  Q14  E  F
2  c  4  9    2  6  a
3  d  5  4    2  9  b

What you need is possible, but not recommended:

for i, g in df.groupby('Q14'):
    globals()['fleet_type_{}'.format(i)] = g

print (fleet_type_2 )
   A  B  C  Q14  E  F
2  c  4  9    2  6  a
3  d  5  4    2  9  b
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252