1

considering the following dataset:

df = pd.DataFrame(data=np.array([['a',1, 2, 3,'T'], ['b',4, 5, 6,'T'],
                                 ['b',9, 9, 39,'T'],
                                 ['c',16, 17 , 18,'N']])
                  , columns=['id','A', 'B', 'C','Active'])



 id   A   B   C Active
  a   1   2   3      T
  b   4   5   6      T
  b   9   9  39      T
  c  16  17  18      N

I need to augment each rows of each groups(id) by rows that the active = T , which means that

    a   1   2   3    a   1   2   3
    b   4   5   6    a   1   2   3
    b   9   9  39    a   1   2   3
    a   1   2   3    b   4   5   6
    b   4   5   6    b   4   5   6
    b   9   9  39    b   4   5   6
    a   1   2   3    b   9   9  39
    b   4   5   6    b   9   9  39
    b   9   9  39    b   9   9  39
    a   1   2   3    c  16  17   18  
    b   9   9   39   c  16  17   18 
    b   4   5   6    c  16  17   18

I have an idea which I could not implement it. first, make a new dataset by filtering data. take all rows that active column is equal to T and save it in a new df.

df_t = df [df['Active']=='T']

then for each rows of df add a new vector form df_t dataset. which means that :

for sample in df:
  for t in df_t:
    df_new = sample + t  ( vectors of df and df_t join together)
    Df_new = concat(df_new,Df_new)

I really appreciate your comments and suggestion to implement my own idea!

Elham
  • 827
  • 2
  • 13
  • 25

1 Answers1

1

You want the catersian cross product of df and df_t. You can do it with a bit of a hack like this:

df['cross'] = 1
df_t['cross'] = 1
df_new = pd.merge(df,df_t.drop('Active',axis=1),on='cross').drop('cross',axis=1)

Putting it all together:

import numpy as np
import pandas as pd
df = pd.DataFrame(data=np.array([['a',1, 2, 3,'T'], ['b',4, 5, 6,'T'],
                                 ['b',9, 9, 39,'T'],
                                 ['c',16, 17 , 18,'N']])
                  , columns=['id','A', 'B', 'C','Active'])
df_t = df [df['Active']=='T']
df['cross'] = 1
df_t['cross'] = 1
df_new = pd.merge(df,df_t.drop('Active',axis=1),on='cross').drop('cross',axis=1)

results in:

>>> df_new
   id_x A_x B_x C_x Active id_y A_y B_y C_y
0     a   1   2   3      T    a   1   2   3
1     a   1   2   3      T    b   4   5   6
2     a   1   2   3      T    b   9   9  39
3     b   4   5   6      T    a   1   2   3
4     b   4   5   6      T    b   4   5   6
5     b   4   5   6      T    b   9   9  39
6     b   9   9  39      T    a   1   2   3
7     b   9   9  39      T    b   4   5   6
8     b   9   9  39      T    b   9   9  39
9     c  16  17  18      N    a   1   2   3
10    c  16  17  18      N    b   4   5   6
11    c  16  17  18      N    b   9   9  39
Dan
  • 45,079
  • 17
  • 88
  • 157
  • this one does not work with my desire output. I need the combination with 'c' 16 17 18 N as well – Elham Jan 31 '18 at 18:52
  • @AlterNative If you edit your question so that I can copy and paste your data as a `DataFrame` then I can test it on your actual data. It worked well on my dummy data. I don't see why this would leave off the `'c' 16 17 18 N` row though – Dan Jan 31 '18 at 18:54