2

I have a table that looks like this: TABLE

LOCATION   ANALYSIS TYPE
A          A,B,C,D
A          D,E,F
B          KA
B          A,B,C
B          C,D,E

I wish to generate a single line inventory for each location that lists the Analysis types available using pandas. So, for this column of lists I want to merge and display a single list of unique values by location.

IE:
A         A,B,C,D,E,F
B         A,B,C,D,E,KA

DESIRED TABLE_OUT

Sorry, no code. I have tried various splits in pandas, not getting close.

MyCarta
  • 808
  • 2
  • 12
  • 37
  • Could you post some of the splits you tried in pandas? – sawyermclane Mar 03 '20 at 19:51
  • Played with splitting on the "," in the list. It made a new column for each item....nowhere near what I was trying to accomplish. :D – life_on_python Mar 03 '20 at 19:54
  • 1
    You shouldn't have edited your question with images. Leave the tables as text. – Quang Hoang Mar 03 '20 at 19:58
  • Dang..."im helping" – life_on_python Mar 03 '20 at 19:59
  • 1
    Text is back...I need to sort out how to format a table in here. This is my first post...please dont hate me :D – life_on_python Mar 03 '20 at 20:02
  • G. Anderson...this is close. Only difference is in my case column B is already a list..I will give it a try. Thanks – life_on_python Mar 03 '20 at 20:22
  • This works, mostly. For some reason it keeps duplicates, each row is surrounded by single quotes. IE actual output:00/01-03-002-20W4/0 ['TSPM, TSPT, XRD', 'TSPM, TSPT, XRD', 'TSPM, TSPT, XRD', 'TSPM, TSPT, XRD', 'TSS', 'TSS', 'TSS', 'TSS', 'CHST', 'CHST', 'CHST', 'CHST', 'CHST', 'CHST', 'CHST', 'CHST', 'CHST', 'CHST', 'CHST', 'CHST'] 00/01-03-016-01W5/0 ['SEMP', 'SEMP', 'SEMS', 'SEMS'] – life_on_python Mar 03 '20 at 20:35

1 Answers1

3

Try this:

df1 = pd.DataFrame({'Location':['A']*2+['B']*3
                   ,'Analysis_Type':['A,B,C,D','D,E,F','KA','A,B,C','C,D,E']})
df1.set_index('Location')['Analysis_Type'].str.split(',')\
   .groupby(level=0)\
   .agg(lambda x: ','.join(sorted(list(set(x.sum())))))\
   .reset_index()

Output:

  Location Analysis_Type
0        A   A,B,C,D,E,F
1        B  A,B,C,D,E,KA
Scott Boston
  • 147,308
  • 15
  • 139
  • 187