1

I split the dialogue into two dictionaries, each of them contains words which the person say (i have 2 persons). I have to print 4 columns (keyword, number from first directory (how many times use that word first person), number from second directory and count of them) and order by keyword. Can somebody help me ? Output have to look like this:

african   1  0  1
air-speed 1  0  0
an        1  1  2
arthur    1  0  1
...

As you can see I have som text

text = """Bridgekeeper: Hee hee heh. Stop. What... is your name?
King Arthur: It is 'Arthur', King of the Britons.
Bridgekeeper: What... is your quest?
King Arthur: To seek the Holy Grail.
Bridgekeeper: What... is the air-speed velocity of an unladen swallow?
King Arthur: What do you mean? An African or European swallow?"""

Output of bridgekeeper_w and arthur_w:

print (bridgekeeper_w) 

{'hee': 2, 'heh': 1, 'stop': 1, 'what': 3, 'is': 3, 'your': 2, 'name': 1, 'quest': 1, 'the': 1, 'air-speed': 1, 'velocity': 1, 'of': 1, 'an': 1, 'unladen': 1, 'swallow': 1}

print (arthur_w)
{'king': 4, 'it': 1, 'is': 1, 'arthur': 1, 'of': 1, 'the': 2, 'britons': 1, 'to': 1, 'seek': 1, 'holy': 1, 'grail': 1, 'what': 1, 'do': 1, 'you': 1, 'mean': 1, 'an': 1, 'african': 1, 'or': 1, 'european': 1, 'swallow': 1}

Now i need this (keyword, number from first dict, number from second dict, and count):

african   1  0  1
air-speed 1  0  0
an        1  1  2
arthur    1  0  1
...
``
Marquee
  • 125
  • 8

3 Answers3

2

If you already have two dictionaries, the main problem is how to loop over keys which are in either dictionary. But that's not hard;

for key in sorted(set(list(bridgekeeper_w.keys()) + list(arthur_w.keys()))):
    b_count = 0 if key not in bridgekeeper_w else bridgekeeper_w[key]
    a_count = 0 if key not in arthur_w else arthur_w[key]
    print('%-20s %3i %3i %3i' % (key, b_count, a_count, b_count+a_count))

If the integrity of the dictionaries is not important, a more elegant solution might be to add the missing keys to one of the dictionaries, and then simply loop over all its keys.

for key in arthur_w.keys():
    if key not in bridgekeeper_w:
        bridgekeeper_w[key] = 0

for key, b_count in sorted(bridgekeeper_w.items()):
    a_count = 0 if key not in arthur_w else arthur_w[key]
    print('%-20s %3i %3i %3i' % (key, b_count, a_count, b_count+a_count))

This does away with the rather tedious and slightly complex set(list(keys()...)) of the first solution, at the cost of traversing one of the dictionaries twice.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Could you please explain how your formatting (`%-20s %3i %3i %3i`) works ? I don't get it … – TheEagle Apr 26 '21 at 12:32
  • There are several print formatting solutions in Python, all of which have mechanisms for aligning columns to a particular width. In brief, `%-20s` says make room for a string which is 20 characters long, and pad to the right if the contents are shorter. Similarly, `%3i` says print a number with up to three digits, and pad to the left. There are many questions about this; see e.g. https://stackoverflow.com/questions/8450472/how-to-print-a-string-at-a-fixed-width – tripleee Apr 26 '21 at 12:39
  • Well, as I said, your answer is clearly better, so it deserves the accept … – TheEagle Apr 26 '21 at 12:43
  • Thanks, you are much too humble (-: – tripleee Apr 26 '21 at 12:43
  • Why are you using the oh too old `%` formatting? Could be as an f-string: `f'{key:20} {b_count:3} {a_count:3} {b_count+a_count}'` – Tomerikoo Apr 26 '21 at 13:15
  • @Tomerikoo I sometimes do that too, because I want my end-users to be able to use my applications out-of-the-box, instead of requiring them to install Python 3 if they only have Python 2. In general, it's always better to use the way that is supported across the most versions, instead of the one only newer versions support (At least I think so) – TheEagle Apr 26 '21 at 13:42
  • @Tomerikoo Old does not mean bad. (Look who's talking.) I wanted to like `.format()` and f-strings but often find them more verbose and cumbersome. I actually started writing a `.format()` solution but it was just too clunky. I can't say I see anything to sway me to particularly prefer anything in your f-string variant either. I spent many years in Perl battling the opposite problem, where interpolating increasingly complex expressions into a string eventually hampers readability. – tripleee Apr 26 '21 at 13:47
  • @Programmer (@tripleee) yes of course I can understand both sides. I guess it comes down to a matter of preference, habit and taste. Personally I find the f-string a bit more clear and easy on the eyes... Same goes to `format()` - I don't like the repetitiveness of specifying each format, and then again the replacement for it. I like that you just write the variable inside the format itself with the f-strings – Tomerikoo Apr 26 '21 at 14:22
  • @Tomerikoo I really ___really___ like f-strings - but, very sadly, they are not supported in Python 2 :(( – TheEagle Apr 26 '21 at 15:11
0

There are few steps to achieve the below dataframe-

  1. Spilt the string based on '\n' new line char.
  2. initialize the result as defaultdict(list), then split each row on ':' use value at index 0 as the key and the value at index 1 as value.
  3. Convert the value list for each key back to a string via join.
  4. Remove puntuations
  5. Use Counter to evaluate the value of each word in the string.

Finally, we'll have a JSON like this -

{'Bridgekeeper': Counter({'Hee': 1,
          'hee': 1,
          'heh': 1,
          'Stop': 1,
          'What': 3,
          'is': 3,
          'your': 2,
          'name': 1,
          'quest': 1,
          'the': 1,
          'airspeed': 1,
          'velocity': 1,
          'of': 1,
          'an': 1,
          'unladen': 1,
          'swallow': 1}),

This JSON can be transformed into the required output very easily if we load it into a dataframe.

from collections import defaultdict
import string
from collections import Counter
import pandas as pd

result = defaultdict(list)
for row in text.split('\n'):
    result[row.split(':')[0].strip()].append(row.split(':')[1].strip())

result = {key:(' '.join(value)).translate(str.maketrans('', '', string.punctuation)) for key,value in result.items()}
result = {key:Counter(value.split(' ')) for key,value in result.items()}
df = pd.DataFrame(result).fillna(0).astype(int)
df['sum'] = df['Bridgekeeper'] + df['King Arthur']
df.to_csv('out.csv', sep='\t')

Output Dataframe -

          Bridgekeeper  King Arthur  sum
Hee                  1            0    1
hee                  1            0    1
heh                  1            0    1
Stop                 1            0    1
What                 3            1    4
is                   3            1    4
your                 2            0    2
name                 1            0    1
quest                1            0    1
the                  1            2    3
airspeed             1            0    1
velocity             1            0    1
of                   1            1    2
an                   1            0    1
unladen              1            0    1
swallow              1            1    2
It                   0            1    1
Arthur               0            1    1
King                 0            1    1
Britons              0            1    1
To                   0            1    1
seek                 0            1    1
Holy                 0            1    1
Grail                0            1    1
do                   0            1    1
you                  0            1    1
mean                 0            1    1
An                   0            1    1
Nk03
  • 14,699
  • 2
  • 8
  • 22
  • Can you please explain what your code does line by line ? Even for me as an advanced Python coder this is not obvious, and we do not know how experienced the OP is# – TheEagle Apr 26 '21 at 12:17
  • If you did not notice, the OP only wants to know how to join and print 2 dicts … – TheEagle Apr 26 '21 at 12:23
  • Thank you but i think that its really "hard coding" for me as a starter. I don´t want to edit code i just want to help with print it into form keyword,number, number , number – Marquee Apr 26 '21 at 12:23
  • but its my second lesson at university about python and we have to use pandas – Marquee Apr 26 '21 at 14:15
0

Or a solution without third-party libraries:

bridgekeeper_d = {'hee': 2, 'heh': 1, 'stop': 1, 'what': 3, 'is': 3, 'your': 2, 'name': 1, 'quest': 1, 'the': 1, 'air-speed': 1, 'velocity': 1, 'of': 1, 'an': 1, 'unladen': 1, 'swallow': 1}
arthur_d = {'king': 4, 'it': 1, 'is': 1, 'arthur': 1, 'of': 1, 'the': 2, 'britons': 1, 'to': 1, 'seek': 1, 'holy': 1, 'grail': 1, 'what': 1, 'do': 1, 'you': 1, 'mean': 1, 'an': 1, 'african': 1, 'or': 1, 'european': 1, 'swallow': 1}
joined = dict.fromkeys(list(bridgekeeper_d.keys()) + list(arthur_d.keys()), {})

for key, value in bridgekeeper_d.items():
    joined[key]["bridgekeeper"] = value

for key, value in arthur_d.items():
    joined[key]["arthur"] = value
# At this point, joined looks like this:
# {
#     'hee': {'bridgekeeper': 1, 'arthur': 1},
#     'heh': {'bridgekeeper': 1, 'arthur': 1},
#     'stop': {'bridgekeeper': 1, 'arthur': 1},
#     'what': {'bridgekeeper': 1, 'arthur': 1}
#     ...
# }

for key, dic in joined.items():
    print("%-15s %d %d %d" % (key, dic["bridgekeeper"], dic["arthur"], dic["bridgekeeper"] + dic["arthur"]))

Output:

hee             1 1 2
heh             1 1 2
stop            1 1 2
what            1 1 2
is              1 1 2
your            1 1 2
name            1 1 2
quest           1 1 2
the             1 1 2
air-speed       1 1 2
velocity        1 1 2
of              1 1 2
an              1 1 2
unladen         1 1 2
swallow         1 1 2
king            1 1 2
it              1 1 2
arthur          1 1 2
britons         1 1 2
to              1 1 2
seek            1 1 2
holy            1 1 2
grail           1 1 2
do              1 1 2
you             1 1 2
mean            1 1 2
african         1 1 2
or              1 1 2
european        1 1 2
TheEagle
  • 5,808
  • 3
  • 11
  • 39