find most common strings in al columns

Question

     Values          
0   99;3;;Sicherheitstür (0SS4.2) bei Anfang Boxen...
1   100;3;;Sicherheitstür (0SS4.2) bei Anfang Boxe...
3   145;3;;Sicherheitstür (0SS3b.5) bei Einspeisef...
4   95;3;;Sicherheitstür (0SS3b.5) vor Boxen unten...
5   96;3;;Sicherheitstür (0SS3b.5) vor Boxen unten...
6   30;3;;Anlage ausgeschaltet (Schlüsselschalter ...
7   37;3;;Sicherheitsbereich 5 (Paketierung) ausge...
12  1400;2;;Entladeförderer (Pos. 730) -Handbetrie...
13  1404;2;;Stauförderer 2 (Pos. 1130) -Handbetrie...
14  1401;2;;Bretterzerteiler (Pos. 1060) -Handbetr...
15  1431;2;;Stauförderer 2 (Pos. 1130) -Handbetrie...
17  1402;2;;Ausrichtrollgang (Pos. 1110) -Handbetr...
18  1403;2;;Stauförderer 1 (Pos. 1120) -Handbetrie...
19  1406;2;;Lagenklemmung (Pos. 1140) -Handbetrieb...
20  1402;2;;Ausrichtrollgang (Pos. 1110) -Handbetr..

the df has lots of different values per colum I want to groupy the df after most commun string per column and store the string and its frequency into a dictionary


{Sicherheitstür: 5, Ausrichtrollgang: 2, ....

so far I only could group the df in a simple form

df_new = df.groupby(['a']).groups

score 0 · Answer 1 · answered Nov 25 '19 at 08:00

0

new_data = df["a"].value_counts().to_dict()

answered Nov 25 '19 at 08:00

Natheer Alabsi

2,790
4
19
28

How can I only select by most commun strings ? – Nov 25 '19 at 08:17
the new dictionary new_data is sorted by value counts and the most common strings will be first from the top. if you want to select the top 10 strings, you can use new_data = df["a"].value_counts()[0:10].to_dict() – Natheer Alabsi Nov 25 '19 at 08:35

score 0 · Answer 2 · answered Nov 25 '19 at 08:24

You can built your dictionary with:

mydict = df["a"].value_counts().to_dict()

this will give you the words and the count of words, then you can sort it with:

for key in sorted(mydict):
    print "%s: %s" % (key, mydict[key])

Or:

from collections import OrderedDict
ordereddict = OrderedDict(sorted(mydict.items(), key=lambda t: t[0]))

For more ways to sort it, you can have a look here.

For a smiliar question, you can have a look here.

find most common strings in al columns

2 Answers2