1

I am trying to get the first element throughout multiple column and count how many times the value appear.

A       B
1,2,3 23,4,5
2     54 
2     2
result
1  1
2  3
54 1
32 1
Cheng
  • 196
  • 1
  • 10

1 Answers1

2

Use DataFrame.stack for Series, then Series.str.split with selecting first values by indexing, convert to integer if necessary and count by Series.value_counts, if necessary sorting last:

s = df.stack().str.split(',').str[0].astype(int).value_counts().sort_index()
print (s)
1     1
2     3
23    1
54    1
dtype: int64

If need 2 columns DataFrame:

df1 = (df.stack()
         .str.split(',')
         .str[0]
         .astype(int)
         .value_counts()
         .sort_index()
         .rename_axis('result')
         .reset_index(name='counts'))
print (df1)
   result  counts
0       1       1
1       2       3
2      23       1
3      54       1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • what if the df have more than one column and I only want to use two of the columns A,B and ignore the rest. – Cheng Apr 09 '19 at 13:46
  • 2
    @Cheng - then filter it by list - `s = df[['A','B']].stack().str.split(',').str[0].astype(int).value_counts().sort_index()` – jezrael Apr 09 '19 at 13:47
  • ok another question is it possible to retrieve the element for usage. For example i want to get the value '3' at the count columns. I have tried using print (s[0]) – Cheng Apr 09 '19 at 14:07
  • 1
    @Cheng if this address your problem and resolve it , please consider upvote and accept :-) – BENY Apr 09 '19 at 14:08
  • 1
    @Cheng - yes, use [boolean indexing](https://stackoverflow.com/q/17071871) - e.g. by `df = df[df['count'] == 3]` – jezrael Apr 09 '19 at 14:09
  • @jezrael glad I can help :-) – BENY Apr 09 '19 at 14:10
  • Thank you so much for your help! @jezrael – Cheng Apr 09 '19 at 14:35