2

Sorry in advance if this question has been answered before but I can't seem to find it.

I have panda dataframe like so:

id | value1 | value2 | ... | valueN
1  | 321    | 44     | ... | 7766
2  | 5678   | 7638   | ... | 987423
2  | 0971   | 7638   | ... | 1
and so on...

I load it correctly and what I want to achieve is an OrderedDict which will collapse the double values if needed. For the above example,

the output dictionary should be:

{1: ['321', '44', ..., '7766'], 2:['5678,0971', '7638', ..., '987423,1']}

Notice that the values of the dictionary are list and the values of the list are strings.

My code so far is:

od = collections.OrderedDict()
for k in df.id:
        if k in od:
            # This key, pre-exists in this dictionary, so we have to append values
            # what should I do here?
        else:
            # new value inserted. proceed.
            od[k] = unordered_dict.get(k)

any ideas?

Mixalis
  • 532
  • 5
  • 17

1 Answers1

0

I think this is what you need, at least it worked on my dummy data:

all_data = {}                   
for column in df.columns.values[1:]:
    data = df.groupby('id').apply(lambda x: ','.join(x[column])).to_dict()
    for key in data:
        if key in all_data.keys():
            all_data[key].append(data[key])
        else:
            all_data[key] = [data[key]]
zipa
  • 27,316
  • 6
  • 40
  • 58