0
Row_Number ColumnA ColumnB
1 Data 1 A
2 Data A
3 Data B
4 Data 2 B
5 Data C
6 Data D
7 Data E
8 Data 3 C

this is how my data updated on a dataframe from another table, In this "Data 1" having a continuation as "Data A" and "Data B" and having only one data as on column B as "A", but in data frame all the "Data 1", "Data A", "Data B" are created in separate rows. Is there any operations are available to concatenate all the "Row_number" 1,2,3 to be appended on a single line and having only value as "A" in columnB?

Expected output is below:

Row_Number ColumnA ColumnB
1 Data1,Data A, Data B A
2 Data2,Data C, Data D,Data E B
3 Data3 C

Thanks in Advance

I'm new to python and have tried the below code

# variable that stores the values: table_values
import pandas as pd

df=pd.DataFrame(table_values,columns=['ColumnA','ColumnB'])

for index, row in df.iterrows():
  if df.loc[index,'ColumnB'] == '' & df.loc[index,'ColumnA'] != '':
    df.loc[index-1, 'ColumnA'] = df.loc[index-1, 'ColumnA'] + df.loc[index, 'ColumnA']  
print(df)'''
Shinilrk
  • 47
  • 9
  • 1
    Don't you want to concatenate `Data E`? Only every two rows after non-empty `ColumnB`? – Andrej Kesely Apr 15 '21 at 20:45
  • @AndrejKesely , No, not like that these are the data extracted using OCR and extracted data will be updating into an record variable. from record variable these are assigning into an dataframe, these Data A and Data B are the continuation of Data 1. but it's splited and created new rows – Shinilrk Apr 15 '21 at 22:28

1 Answers1

1

You could do a groupby here. The simplest is to just use Column B, since that already corresponds to the rows you want to group. So given this data:

import pandas as pd

df = pd.DataFrame({'ColumnA': ['Data1', 'DataA', 'DataB',
                               'Data2', 'DataC', 'DataD', 'DataE',
                               'Data3'],
                   'ColumnB': ['A', None, None,
                               'B', None, None, None,
                               'C']})

#  ColumnA ColumnB
# 0   Data1       A
# 1   DataA    None
# 2   DataB    None
# 3   Data2       B
# 4   DataC    None
# 5   DataD    None
# 6   DataE    None
# 7   Data3       C

Create something to group with, by filling the missing values:

grouper = df['ColumnB'].ffill()

# 0    A
# 1    A
# 2    A
# 3    B
# 4    B
# 5    B
# 6    B
# 7    C
# Name: ColumnB, dtype: object

Then aggregate to concat strings, following this post:

output = df['ColumnA'].groupby(grouper).apply(lambda x : ', '.join(x)).reset_index()

Final result:

  ColumnB                     ColumnA
0       A         Data1, DataA, DataB
1       B  Data2, DataC, DataD, DataE
2       C                       Data3
Tom
  • 8,310
  • 2
  • 16
  • 36
  • Hi @Tom, Thanks for the help, here the data's are dynamic and cannot add tthe "None" value manually, is there any option available to update the empty rows to update as "None" before doing the group by? – Shinilrk Apr 15 '21 at 22:19
  • Please suggest any other option without using group by – Shinilrk Apr 15 '21 at 22:31