-2

I have a dataframe with a list of strings and I would like to add columns with the number of occurrences of a character, sorted with the maximum to minimum occurrences The datafrae is very big so I need an efficient way to calculate it

Originale df:

    Item
0   ABABCBF
1   ABABCGH
2   ABABEFR
3   ABABFBF
4   ABACTC3

Wanted df:

    Item    o1  o2  o3  o4  o5
0   ABABCBF 3   2   1   1   null
1   ABABCGH 2   2   1   1   1
2   ABABEFR 2   2   1   1   1
3   ABABFBF 3   2   2   null    null
4   ABACTC3 2   2   1   1   1

I have tried using collection counter but I am not able to convert the result in the column of the dataframe collections.Counter(df['item'])

Thanks

SkyBest I
  • 15
  • 2
  • 2
    [*Please do not post text as images*](https://meta.stackoverflow.com/q/285551). Copy and paste the text into your question and use the code formatting tool (`{}` button) to format it correctly. Images are not searchable, cannot be interpreted by screen readers for those with visual impairments, and cannot be copied for testing and debugging purposes. Use the [edit] link to modify your question. Please also see [How to make good reproducible `pandas` examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – MattDMo Jan 15 '23 at 19:48

3 Answers3

2

You can use collections.Counter and the DataFrame constructor:

from collections import Counter

out = df.join(pd.DataFrame(
  sorted(Counter(x).values(), reverse=True)
         for x in df['Item'])
  .rename(columns=lambda x: f'o{x+1}')
)

print(out)

Output:

      Item  o1  o2  o3   o4   o5
0  ABABCBF   3   2   1  1.0  NaN
1  ABABCGH   2   2   1  1.0  1.0
2  ABABEFR   2   2   1  1.0  1.0
3  ABABFBF   3   2   2  NaN  NaN
4  ABACTC3   2   2   1  1.0  1.0
mozway
  • 194,879
  • 13
  • 39
  • 75
1

Try:

import json
import pandas as pd
from collections import Counter
df = pd.DataFrame({'Item': ['ABACABDF', 'BACBDFHGAAAA']})
result = df.join(
    pd.DataFrame(
        json.loads(
            df['Item']
            .transform(lambda x: sorted(list(Counter(x).values()), reverse=True))
            .to_json(orient='records')
        )
    )
    .rename(columns=(lambda x: f'o{x+1}'))
)

result

           Item  o1  o2  o3  o4  o5   o6   o7
0      ABACABDF   3   2   1   1   1  NaN  NaN
1  BACBDFHGAAAA   5   2   1   1   1  1.0  1.0
Zach Flanders
  • 1,224
  • 1
  • 7
  • 10
  • 1
    Need to sort Counter values, otherwise, for larger dataframes (such as the one posted), the numbers are not in the right order. – DarrylG Jan 15 '23 at 20:46
0

try this:

def count_chars(txt: str):
    ser = pd.Series([*txt])
    result = ser.value_counts().tolist()
    return result

result = df.join(
    pd.DataFrame([*df['Item'].apply(count_chars)]).rename(columns=lambda x: f'o{x+1}'))
print(result)
>>>
    Item    o1  o2  o3  o4  o5
0   ABABCBF 3   2   1   1.0 NaN
1   ABABCGH 2   2   1   1.0 1.0
2   ABABEFR 2   2   1   1.0 1.0
3   ABABFBF 3   2   2   NaN NaN
4   ABACTC3 2   2   1   1.0 1.0
ziying35
  • 1,190
  • 3
  • 6