3

Having this pandas.core.frame.DataFrame:

Gorilla     A  T  C  C  A  G  C  T
Dog         G  G  G  C  A  A  C  T
Humano      A  T  G  G  A  T  C  T
Drosophila  A  A  G  C  A  A  C  C
Elefante    T  T  G  G  A  A  C  T
Mono        A  T  G  C  C  A  T  T
Unicornio   A  T  G  G  C  A  C  T

I would like to get a data frame like that:

    A   5 1 0 0 5 5 0 0
    C   0 0 1 4 2 0 6 1
    G   1 1 6 3 0 1 0 0
    T   1 5 0 0 0 1 1 6 

Basically, what I want is to count the frequent column by column and create the second df as I show.

I want to do this because finally, I would like to get a Consensus string. Should be something like that A T G C A A C T

Could anyone help me or give me some advice?

2 Answers2

2

You could use Series.value_counts by column:

print(df.iloc[:, 1:].apply(pd.Series.value_counts).fillna(0))

Output

     1    2    3    4    5    6    7    8
A  5.0  1.0  0.0  0.0  5.0  5.0  0.0  0.0
C  0.0  0.0  1.0  4.0  2.0  0.0  6.0  1.0
G  1.0  1.0  6.0  3.0  0.0  1.0  0.0  0.0
T  1.0  5.0  0.0  0.0  0.0  1.0  1.0  6.0
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
2

Try:

result = df.apply(pd.value_counts).fillna(0)

   col1  col2  col3  col4  col5  col6  col7  col8
A   5.0   1.0   0.0   0.0   5.0   5.0   0.0   0.0
C   0.0   0.0   1.0   4.0   2.0   0.0   6.0   1.0
G   1.0   1.0   6.0   3.0   0.0   1.0   0.0   0.0
T   1.0   5.0   0.0   0.0   0.0   1.0   1.0   6.0
luigigi
  • 4,146
  • 1
  • 13
  • 30