0

I have dataframe as follows: I wanted to get the count of D, T and N in new columns of dataframe as Dcount TCount Ncount

data = {'CHROM':['chr1', 'chr2', 'chr1', 'chr3', 'chr1','chr1', 'chr2', 'chr1'],
        'POS':[939570,3411794,1043223,22511093,24454031,3411794,22511093,1043223],
        'MI':['T', 'T', 'D', 'D', 'T', 'N', 'D', 'N'],
        'CSK':['D', 'D', 'N', 'T', 'N', 'D', 'T', 'T'],
        'DD':['N', 'D', 'D', 'D', 'T', 'N', 'D', 'N'],
        'RR':['D', 'T', 'N', 'T', 'D', 'D', 'T', 'N'],
        'RCB':['D', 'D', 'D', 'D', 'D', 'D', 'D', 'D'],
        'DC':['D', 'D', 'T', 'D', 'D', 'D', 'N', 'D']
       }
df1 = pd.DataFrame(data)

df1

    CHROM   POS      MI CSK DD  RR  RCB DC
0   chr1    939570   T  D   N   D   D   D
1   chr2    3411794  T  D   D   T   D   D
2   chr1    1043223  D  N   D   N   D   T
3   chr3    22511093 D  T   D   T   D   D
4   chr1    24454031 T  N   T   D   D   D
5   chr1    3411794  N  D   N   D   D   D
6   chr2    22511093 D  T   D   T   D   N
7   chr1    1043223  N  T   N   N   D   D

I want to get the count of T, D, N in a new dataframe.

Expected output:

    CHROM   POS      MI CSK DD  RR  RCB DC  Dcount  Tcount  Ncount
0   chr1    939570   T  D   N   D   D   D   4       1       1
1   chr2    3411794  T  D   D   T   D   D   4       2       0
2   chr1    1043223  D  N   D   N   D   T   3       1       2
3   chr3    22511093 D  T   D   T   D   D   4       2       0
4   chr1    24454031 T  N   T   D   D   D   3       2       1
5   chr1    3411794  N  D   N   D   D   D   4       0       2
6   chr2    22511093 D  T   D   T   D   N   3       2       1
7   chr1    1043223  N  T   N   N   D   D   2       1       3
petezurich
  • 9,280
  • 9
  • 43
  • 57
  • 3
    This question already has [`answers`](https://stackoverflow.com/questions/57242200/how-to-do-value-counts-on-each-row-and-make-some-columns-whose-values-are-the-co). – Mayank Porwal Apr 15 '21 at 06:27

1 Answers1

-1

Use DataFrame.iloc for select all columns from 2 to end of DataFrame with counts values by value_counts, repalce missing values to 0, then use DataFrame.add_suffix and append to original by DataFrame.join:

df1 = (df1.join(df1.iloc[:, 2:]
                   .apply(pd.value_counts, axis=1)
                   .fillna(0)
                   .astype(int)
                   .add_suffix('count')))
print (df1)
  CHROM       POS MI CSK DD RR RCB DC  Dcount  Ncount  Tcount
0  chr1    939570  T   D  N  D   D  D       4       1       1
1  chr2   3411794  T   D  D  T   D  D       4       0       2
2  chr1   1043223  D   N  D  N   D  T       3       2       1
3  chr3  22511093  D   T  D  T   D  D       4       0       2
4  chr1  24454031  T   N  T  D   D  D       3       1       2
5  chr1   3411794  N   D  N  D   D  D       4       2       0
6  chr2  22511093  D   T  D  T   D  N       3       1       2
7  chr1   1043223  N   T  N  N   D  D       2       3       1

Or use DataFrame.stack with SeriesGroupBy.value_counts and Series.unstack:

df1 = df1.join(df1.iloc[:, 2:]
                  .stack()
                  .groupby(level=0)
                  .value_counts()
                  .unstack(fill_value=0)
                  .add_suffix('count'))
print (df1)
  CHROM       POS MI CSK DD RR RCB DC  Dcount  Ncount  Tcount
0  chr1    939570  T   D  N  D   D  D       4       1       1
1  chr2   3411794  T   D  D  T   D  D       4       0       2
2  chr1   1043223  D   N  D  N   D  T       3       2       1
3  chr3  22511093  D   T  D  T   D  D       4       0       2
4  chr1  24454031  T   N  T  D   D  D       3       1       2
5  chr1   3411794  N   D  N  D   D  D       4       2       0
6  chr2  22511093  D   T  D  T   D  N       3       1       2
7  chr1   1043223  N   T  N  N   D  D       2       3       1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 2
    Unfortunately, this is a dupe. Please check the linked question. It has both your answers. – Mayank Porwal Apr 15 '21 at 06:12
  • 1
    @MayankPorwal - It is partly dupe, in linked Q/A now under question is misisng filter out first 2 columns by `iloc` and append by `join`. – jezrael Apr 15 '21 at 06:35