1

I have dataframe stored in df. and headers stored in samp

samp = ["1C", "1P1", "1P2"]
for i in samp:
    c1 = df[df['SampleID']==i]
    print(i,"\n", c1['Ranked'].value_counts(),"\n" )

This gives an output as follows:

1C 
 missense_variant            630
frameshift_variant           39
inframe_insertion            24
splice_donor_variant         16
inframe_deletion             15
stop_gained                  14
splice_acceptor_variant       4
stop_lost                     2
start_lost                    2
protein_altering_variant      1
Name: Ranked, dtype: int64 

1P1 
 missense_variant            349
frameshift_variant           21
inframe_insertion            19
inframe_deletion             13
stop_gained                  12
splice_donor_variant         11
splice_acceptor_variant       3
stop_lost                     1
protein_altering_variant      1
Name: Ranked, dtype: int64 

1P2 
 missense_variant            323
frameshift_variant           30
inframe_insertion            21
splice_donor_variant         12
inframe_deletion              6
stop_gained                   4
stop_lost                     2
start_lost                    2
splice_acceptor_variant       2
protein_altering_variant      1
Name: Ranked, dtype: int64 

I want to make a new df from the count_value() by merging them as follows:

Type                     1C     1P1 1P2
missense_variant         630    349 323
frameshift_variant        39    21  30
inframe_insertion         24    19  21
splice_donor_variant      16    11  12
inframe_deletion          15    13  6
stop_gained               14    12  4
splice_acceptor_variant    4    3   2
stop_lost                  2    0   2
start_lost                 2    1   2
protein_altering_variant   1    1   1
  • Please provide [mcve] – Ch3steR May 11 '21 at 11:15
  • Check this link -- [https://stackoverflow.com/questions/23668427/pandas-three-way-joining-multiple-dataframes-on-columns/23671390#23671390](https://stackoverflow.com/questions/23668427/pandas-three-way-joining-multiple-dataframes-on-columns/23671390#23671390) – Kabilan Mohanraj May 11 '21 at 11:19

1 Answers1

0

First filter SampleID by Series.isin and then use SeriesGroupBy.value_counts:

samp = ["1C", "1P1", "1P2"]

df2 = df[df['SampleID'].isin(samp)].groupby('SampleID')['Ranked'].value_counts().unstack()

Or crosstab:

df1 = df[df['SampleID'].isin(samp)]
df2 = pd.crosstab(df1['SampleID'], df1['Ranked'])
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252