-1

I am kind of new to data frames and been trying to find a solution to shift data in specific columns to another row if ID and CODE matches.

    df = pd.DataFrame({'ID': ['123', '123', '123', '154', '167', '167'], 
                   'NAME': ['Adam', 'Adam', 'Adam', 'Bob', 'Charlie', 'Charlie'],
                   'CODE': ['1001', '1001', '1011', '1002', 'A0101', 'A0101'],
                   'TAG1': ['A123', 'B123', 'K123', 'D123', 'E123', 'G123'],
                   'TAG2': [ np.NaN, 'C123', 'L123', np.NaN, 'F123', 'H123'],
                   'TAG3': [ np.NaN, 'M123', np.NaN, np.NaN, np.NaN, np.NaN]})
    ID    NAME      CODE    TAG1    TAG2    TAG3
0   123   Adam      1001    A123    NaN     NaN
1   123   Adam      1001    B123    C123    M123
2   123   Adam      1011    K123    L123    NaN
3   154   Bob       1002    D123    NaN     NaN
4   167   Charlie   A0101   E123    F123    NaN
5   167   Charlie   A0101   G123    H123    NaN

So above I have added the code which depicts the initial data frame, now we can see that ID='123' has two rows with the same codes and the values in 'TAG' columns vary, I would like to shift the 'TAG' data in the second row to the first row after 'TAG1' or in 'TAG1' if it is empty and delete the second row completely. It should be the same for ID='167'.

Below is another code with a sample data frame I have added manually which depicts the final result, any suggestions would be great. I tried one thing, it exactly did not work the way I wanted it to.

    df_result = pd.DataFrame({'ID': ['123', '123', '154', '167'], 
                   'NAME': ['Adam', 'Adam', 'Bob', 'Charlie',],
                   'CODE': ['1001', '1011', '1002', 'A0101'],
                   'TAG1': ['A123', 'K123', 'D123', 'E123'],
                   'TAG2': ['B123', 'L123', np.NaN, 'F123'],
                   'TAG3': ['C123', np.NaN, np.NaN, 'G123'],
                   'TAG4': ['M123', np.NaN, np.NaN, 'H123']})
     ID     NAME    CODE    TAG1    TAG2    TAG3    TAG4
0    123    Adam    1001    A123    B123    C123    M123
1    123    Adam    1011    K123    L123    NaN     NaN
2    154    Bob     1002    D123    NaN     NaN     NaN
3    167    Charlie A0101   E123    F123    G123    H123

The code that I have tried is below to kind of get the result. But it did not get exactly the output I wanted

df2=pd.pivot_table(test_shift, index=['ID', 'NAME', 'CODE'], 
columns=test_shift.groupby(['ID', 'CODE']).cumcount().add(1),
values=['TAG1'], aggfunc='sum')

Output Image

NOTE: Sorry for the bad posting the first time, I tried adding the dataframe as code to help you visually. But I 'FAILED'. I will try to learn that over the coming days and be a better member of 'Stack Overflow.

Thank you for the help.

  • 1
    https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples This would help greatly if you created your input dataframes like this instead of using pictures. – Scott Boston Aug 19 '20 at 15:48
  • 1
    Please don't post images of code, data, or Tracebacks. Copy and paste it as text then format it as code (select it and type `ctrl-k`) ... [Discourage screenshots of code and/or errors](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors) – wwii Aug 19 '20 at 15:50
  • @ScottBoston and wwii. Thank you for the links, I did check them and they do help me in posting my questions perfectly. But sorry that I couldn't do that yet, I will keep trying. Thanks. – stubbledweb1995 Aug 20 '20 at 00:37
  • @wwii Thank you for the suggestion, I tried and posted the code. But I was unable to add the data frame onto the post. I will try that once again over the next few days. – stubbledweb1995 Aug 20 '20 at 00:38

2 Answers2

3

Here is one-way:

def f(x):
    x = x.stack().to_frame(name='val')
    x = x.assign(tags='Tag'+x["val"].notna().cumsum().astype(str))
    x = (x.reset_index(level=3, drop=True)
          .set_index('tags', append=True)['val'].unstack())
    return x

df_out = (df.set_index(['ID', 'NAME','CODE'])
           .groupby(level=[0,1,2], as_index=False).apply(f)
           .reset_index().drop('level_0', axis=1))

print(df_out)

Output:

    ID     NAME   CODE  Tag1  Tag2  Tag3  Tag4
0  123     Adam   1001  A123  B123  C123  M123
1  123     Adam   1011  K123  L123   NaN   NaN
2  154      Bob   1002  D123   NaN   NaN   NaN
3  167  Charlie  A0101  E123  F123  G123  H123
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • @Scott_Boston, I was able to run the code and it worked as expected for the test data, I have run it against my orginal dataset and there was a small error which I wasn't able to fix and didn't exactly get back to it. I tried it again and it gave me a Key Error, so mostly it is respective to changing the "Column" declaration in the code. I will get it fixed. – stubbledweb1995 Oct 06 '20 at 20:25
  • @stubbledweb1995 Key Error is either column name doesn't exists or row index label doesn't exists. – Scott Boston Oct 06 '20 at 20:28
  • 1
    I was able to get it to work. The total number of new TAG columns that were created totaled out to be 961. I would like to know if there is a way to limit the TAG columns to 9. Like Tag1, Tag 2, ... Tag 9. Thank you for the help. – stubbledweb1995 Oct 07 '20 at 16:11
3

An approach with pd.wide_to_long for flattening the TAGS and then mapping the index as per the group of ID and CODE , then unstacking and joining the de-duplicated rows from ID,NAME and CODE

u = pd.wide_to_long(df.filter(like='TAG').reset_index(),'TAG','index','j').dropna()

idx = u.index.get_level_values(0).map(df.groupby(['ID','CODE']).ngroup()) 

u = u.set_index(pd.MultiIndex.from_arrays((idx,u.groupby(idx).cumcount()+1))).unstack()
u.columns = u.columns.map('{0[0]}{0[1]}'.format)

out = df[['ID','NAME','CODE']].drop_duplicates().reset_index(drop=True).join(u)

print(out)

    ID     NAME   CODE  TAG1  TAG2  TAG3  TAG4
0  123     Adam   1001  A123  B123  C123  M123
1  123     Adam   1011  K123  L123   NaN   NaN
2  154      Bob   1002  D123   NaN   NaN   NaN
3  167  Charlie  A0101  E123  F123  G123  H123
anky
  • 74,114
  • 11
  • 41
  • 70
  • I did run the code and it went without any hitch. I was able to modify all the required fields accordingly. I got a IndexError at the 3rd line and I am trying to figure out the root cause. – stubbledweb1995 Oct 06 '20 at 20:27