0

I have a dataset of names and activities. This data is in one long string format. The data is divided into multiple lines (separated by line break "\n"). Each line has a name and an activity separated by a colon. The last line does not have the line break.

Example: "Jack:travel\nPeter:cycling\nJack:fishing\nPeter:running"

The goal is to create a dictionary from this string, but if names are duplicates, then add activities together into a list after this name:

In the current example the output should be:

{"Jack": ["travel", "fishing"], "Peter": ["cycling", "running"]}

How can I do that?

Michael M.
  • 10,486
  • 9
  • 18
  • 34
  • 1
    Can you add your current code? – YJR Oct 02 '22 at 17:50
  • Welcome to Stack Overflow. Each line specifies a one-key dict, which can be merged using techniques from the linked duplicate (which includes merging them into the desired result, one at a time, as they are processed in a loop). – Karl Knechtel Oct 02 '22 at 19:02

1 Answers1

-1

You can just use str.split() to loop over every line, then get the name and activity, adding or appending them depending on whether or not it is already in the dictionary. Like this:

data = 'Jack:travel\nPeter:cycling\nJack:fishing\nPeter:running\nJack:fishing'

dic = {}
for line in data.split('\n'):
    [name, activity] = line.split(':')
    if name not in dic:
        dic[name] = [activity]
    elif activity not in dic[name]:
        dic[name].append(activity)

print(dic) # => {'Jack': ['travel', 'fishing'], 'Peter': ['cycling', 'running']}

However, as a comment pointed out, it may be better to use a set that will automatically drop duplicates. Like this:

data = 'Jack:travel\nPeter:cycling\nJack:fishing\nPeter:running\nJack:fishing'

dic = {}
for line in data.split('\n'):
    [name, activity] = line.split(':')
    if name not in dic:
        dic[name] = {activity}
    elif activity:
        dic[name].add(activity)

print(dic)
# => {'Jack': {'travel', 'fishing'}, 'Peter': {'cycling', 'running'}}
Michael M.
  • 10,486
  • 9
  • 18
  • 34