Number of character occurences in a list of strings

Question

I have a dataframe with a list of strings and I would like to add columns with the number of occurrences of a character, sorted with the maximum to minimum occurrences The datafrae is very big so I need an efficient way to calculate it

Originale df:

    Item
0   ABABCBF
1   ABABCGH
2   ABABEFR
3   ABABFBF
4   ABACTC3

Wanted df:

    Item    o1  o2  o3  o4  o5
0   ABABCBF 3   2   1   1   null
1   ABABCGH 2   2   1   1   1
2   ABABEFR 2   2   1   1   1
3   ABABFBF 3   2   2   null    null
4   ABACTC3 2   2   1   1   1

I have tried using collection counter but I am not able to convert the result in the column of the dataframe collections.Counter(df['item'])

Thanks

[*Please do not post text as images*](https://meta.stackoverflow.com/q/285551). Copy and paste the text into your question and use the code formatting tool (`{}` button) to format it correctly. Images are not searchable, cannot be interpreted by screen readers for those with visual impairments, and cannot be copied for testing and debugging purposes. Use the [edit] link to modify your question. Please also see [How to make good reproducible `pandas` examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — MattDMo, Jan 15 '23 at 19:48

score 2 · Accepted Answer · answered Jan 15 '23 at 20:37

You can use collections.Counter and the DataFrame constructor:

from collections import Counter

out = df.join(pd.DataFrame(
  sorted(Counter(x).values(), reverse=True)
         for x in df['Item'])
  .rename(columns=lambda x: f'o{x+1}')
)

print(out)

Output:

      Item  o1  o2  o3   o4   o5
0  ABABCBF   3   2   1  1.0  NaN
1  ABABCGH   2   2   1  1.0  1.0
2  ABABEFR   2   2   1  1.0  1.0
3  ABABFBF   3   2   2  NaN  NaN
4  ABACTC3   2   2   1  1.0  1.0

Zach Flanders · Answer 2 · 2023-01-15T20:55:18.673

1

Try:

import json
import pandas as pd
from collections import Counter
df = pd.DataFrame({'Item': ['ABACABDF', 'BACBDFHGAAAA']})
result = df.join(
    pd.DataFrame(
        json.loads(
            df['Item']
            .transform(lambda x: sorted(list(Counter(x).values()), reverse=True))
            .to_json(orient='records')
        )
    )
    .rename(columns=(lambda x: f'o{x+1}'))
)

result

           Item  o1  o2  o3  o4  o5   o6   o7
0      ABACABDF   3   2   1   1   1  NaN  NaN
1  BACBDFHGAAAA   5   2   1   1   1  1.0  1.0

edited Jan 15 '23 at 20:55

answered Jan 15 '23 at 20:08

Zach Flanders

1,224
1
7
10

1

Need to sort Counter values, otherwise, for larger dataframes (such as the one posted), the numbers are not in the right order. – DarrylG Jan 15 '23 at 20:46

score 0 · Answer 3 · answered Jan 16 '23 at 01:31

try this:

def count_chars(txt: str):
    ser = pd.Series([*txt])
    result = ser.value_counts().tolist()
    return result

result = df.join(
    pd.DataFrame([*df['Item'].apply(count_chars)]).rename(columns=lambda x: f'o{x+1}'))
print(result)
>>>
    Item    o1  o2  o3  o4  o5
0   ABABCBF 3   2   1   1.0 NaN
1   ABABCGH 2   2   1   1.0 1.0
2   ABABEFR 2   2   1   1.0 1.0
3   ABABFBF 3   2   2   NaN NaN
4   ABACTC3 2   2   1   1.0 1.0

Number of character occurences in a list of strings

3 Answers3