0

I have an array of strings (a1): ["a", "b", "c"]

And another (a2) that looks like this:

["1,20,300", "2,10,300", "3,40,300", "1, 20, 300, 4000"]

The wanted end result is:

{"a": [1,2,3,1], "b": [20, 10, 40, 20], "c": [300, 300, 300, 4000] }

It is safe to assume that a2[n].split(',') will always give me the items in the correct order, i.e. the order of ["a", "b", "c"], just as in the example.

With this in mind, is it possible not having to loop twice and/or not having to assume the order of keys in a dictionary is consistent?

My solution would be:

a1 = ["a", "b", "c"]
a2 = ["1,20,300", "2,10,300", "3,40,300"]

result = {}

for i in a1:
    result[i] = []

for e in a2:
    splitted = e.split(",")
    c = 0

    for key,array in result.items():
        result[key].append(splitted[c])
        c = c+1

This requires many loops and assumes result.items() will always return the keys in the same order, which is not a safe assumption.

Is there any way to avoid this? Maybe using Pandas?

Saturnix
  • 10,130
  • 17
  • 64
  • 120
  • Iterate over `a1` for your keys instead of `result.items()`. – Code-Apprentice Aug 26 '19 at 17:42
  • Minor comment: in Python 3.7 onward, dictionaries are in fact guaranteed to preserve insertion order, as mentioned in the SO thread to which you link, and in the official docs (https://docs.python.org/3.7/library/stdtypes.html#typesmapping). – Peter Leimbigler Aug 26 '19 at 17:44
  • Since you already figured out how to split the strings as you wish, your question here can be simplified by starting with `a2` as a 2D list instead of a list of strings. – Code-Apprentice Aug 26 '19 at 17:45

5 Answers5

2
from numpy import transpose

a1 = ["a", "b", "c"]
a2 = ["1,20,300", "2,10,300", "3,40,300"]
a2t = transpose([e.split(",") for e in a2])

result = {a1[i] : list(a2t[i]) for i in range(len(a1))}

=> {'a': ['1', '2', '3'], 'b': ['20', '10', '40'], 'c': ['300', '300', '300']}

thx to Code-Apprentice for the suggestion to use {x : y for ... }

save_jeff
  • 403
  • 1
  • 5
  • 11
  • This is a very clean answer. Alternatively, you can use a dict comprehension instead of calling the dict() constructor. And use `enumerate()` instead of `range()`. – Code-Apprentice Aug 26 '19 at 17:53
  • i often struggle with creating dicts in one line so the dict() is my goto solution. how would dict comprehension work in a oneliner? – save_jeff Aug 26 '19 at 17:54
  • I'd do `{k: a2t[i] for i, k in enumerate(a1)}`. (Disclaimer: not tested) – Code-Apprentice Aug 26 '19 at 17:55
  • Take this a step further with `zip()`: `{k: v for k, v in zip(a1, a2t)}`. But now we are back to an even shorter version with `dict()`: `dict(zip(a1, a2t))`. Someone else might have just jumped to this last version directly. – Code-Apprentice Aug 26 '19 at 17:56
  • about the dict(zip(a1, a2t)) solution. the problem is you get numpy.array objects not lists. you would have to have another map function over it. i think the current code is most easy to unterstand – save_jeff Aug 26 '19 at 17:59
  • I think that is a problem in all of my solutions. The two dict comprehensions can be modified easily by converting each value to a list. – Code-Apprentice Aug 26 '19 at 18:01
  • 1
    Generalizing an answer to your original question about a one liner: `dict([(x, y) for ...])` is equivalent to `{x: y for ...}`. Not only is this less typing, it is also more efficient at run time. – Code-Apprentice Aug 26 '19 at 18:03
1

Since you never use array, you don't need to call result.items() at all. Even result.keys() is problematic because, as you say, you can't rely on the order. So you need to iterate over a1 instead. But you also need the index. You can count this yourself as you do in your solution. Or you can use enumerate() to generate it for you:

for c, key in enumerate(a1):

Alternatively, you can transpose your array (after calling split on each string). Then you can build your dictionary in a one-line comprehension.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
1
a1 = ["a", "b", "c"]
a2 = ["1,20,300", "2,10,300", "3,40,300"]
a2 = [item.split(',') for item in a2]

res = {}
for i in range(len(a1)):
    res[a1[i]] = [item[i] for item in a2]
res

{'a': ['1', '2', '3'], 'b': ['20', '10', '40'], 'c': ['300', '300', '300']}
galaxyan
  • 5,944
  • 2
  • 19
  • 43
  • This doesn't seem to work if a2 also contains more strings like this ["1,20,300", "2,10,300", "3,40,300", "1,1,1"] It's my bad I should have specified that in the question. – Saturnix Aug 26 '19 at 18:00
  • @Saturnix if a2 has more elements than a1 what do you want? if the length depend on a1, just swap a2 to a1 in for loop – galaxyan Aug 26 '19 at 18:02
1

Use map, split, numpy array transpose, zip and dict

n = np.array(list(map(lambda x: x.split(','), a2))).T.tolist()

Out[245]: [['1', '2', '3'], ['20', '10', '40'], ['300', '300', '300']]

result = dict(zip(a1, n))

Out[247]: {'a': ['1', '2', '3'], 'b': ['20', '10', '40'], 'c': ['300', '300', '3
00']}
Andy L.
  • 24,909
  • 4
  • 17
  • 29
1

In case you want lists of integers as your output as you initial post suggests, you could do to the following:

dict(zip(a1, np.array([[int(j) for j in i.split(',')][:3] for i in a2]).T.tolist()))

Note that I used a slice in the inner loop to make sure that each element of a1 had the same length. This returns

Out[17]: {'a': [1, 2, 3, 1], 'b': [20, 10, 40, 20], 'c': [300, 300, 300, 300]}
lmo
  • 37,904
  • 9
  • 56
  • 69