Join lists with repeated values

Question

First, this is what I have on the code so far, I'll give the explanation in a bit:

ll1 = [
'A',
'B',
'C',
'D'
]

l2 = [
['A', 10],
['B', 20],
['D', 5],
['A', 15],
['B', 30],
['C', 10],
['D', 15]
]

dc = dict(l2)
l3 = [[k, dc.get(k, 0)] for k in l1]

The result is this:

['A', 15]
['B', 30]
['C', 10]
['D', 15]

The first list l1 is made of a fixed number of keys and the second list l2 has the values to each key given in the first list. The l2 here is just one example as I'll be getting the values later (and these values are will be given as a list) but they'll have the same keys as l1. Every key needs to be shown, a key can be repeated, but some keys may have a null value (eg. the item C).

But when the list becomes the dict, the first value of each key is thrown away, returning unique keys for the dictionary.

How could it be done so that the result is similar to this one below?

['A', 10]
['B', 20]
['C', 0]
['D', 5]
['A', 15]
['B', 30]
['C', 10]
['D', 15]

Another example would be:

database_keys = [
'First Name',
'Last Name',
'Email',
'City'
]
database_input = [
['First Name', 'John'],
['Last Name', 'Doe'],
['Email', 'johndoe@test.com'],
['First Name', 'Jane'],
['Email', 'jane@test.com']
]

Output:
['First Name', 'John']
['Last Name', 'Doe']
['Email', 'johndoe@test.com']
['City', None]
['First Name', 'Jane']
['Last Name', None]
['Email', 'jane@test.com']
['City', None]

That expected output looks a whole lot like `l2` which you already have!? — user2390182, Jun 05 '16 at 01:39
Yeah, I forgot to mention that the values are not predetermined, they'll only have the same keys. I'll edit the question. — Edu C., Jun 05 '16 at 01:54
There you go, modified and gave another example. @schwobaseggl — Edu C., Jun 05 '16 at 02:02
are you at least guaranteed that the first value is present for each set? Like could there be just two `Email` entries right next to each other because the second is for another person? — Tadhg McDonald-Jensen, Jun 05 '16 at 02:12
@TadhgMcDonald-Jensen yes, the First Name is required and so is the e-mail, so there wouln't be 2 e-mails next to each other. But if there'd be, wouldn't it happen like the example? As in, only the last value would be shown? — Edu C., Jun 05 '16 at 03:49
All implementations will work off of assumptions about the input data, my first implementation assumes that the order of data always matches up with the order of the keys (and some are missing) my second implementation (and [@MoonCheesez's answer](http://stackoverflow.com/a/37638279/5827215)) assumes that every entry starts with a certain value. — Tadhg McDonald-Jensen, Jun 05 '16 at 04:43

Tadhg McDonald-Jensen · Accepted Answer · 2016-06-05T04:07:43.940

I would use a generator to fill in the missing values, just keep a cycle of the keys and when the next needed key is not the one in the data just produce the empty value:

import itertools
def fill_the_blanks(data, keys):
    keys = itertools.cycle(keys)
    for name, value in data:
        k = next(keys)
        while name!=k:
            yield [k,None]
            k = next(keys)
        yield [name,value]


>>> from pprint import pprint
>>> pprint( list(fill_the_blanks(l2, ll1)) )
[['A', 10],
 ['B', 20],
 ['C', None],
 ['D', 5],
 ['A', 15],
 ['B', 30],
 ['C', 10],
 ['D', 15]]
>>> pprint( list(fill_the_blanks(database_input,database_keys)) )
[['First Name', 'John'],
 ['Last Name', 'Doe'],
 ['Email', 'johndoe@test.com'],
 ['City', None],
 ['First Name', 'Jane'],
 ['Last Name', None],
 ['Email', 'jane@test.com']]

As an alternative, if you know that the first key 'First Name' will always mark the beginning of an entry why not just use dict.fromkeys then fill in until you reach the next 'first value':

def gen_dicts(data, keys):
    first_key = keys[0]
    entry = None #placeholder for first time
    for name, value in data:
        if name == first_key:
            if entry is not None: #skip first time
                yield entry
            entry = dict.fromkeys(keys)
        entry[name] = value
    yield entry #last one

>>> from pprint import pprint
>>> pprint( list(gen_dicts(l2, ll1)) )
[{'A': 10, 'B': 20, 'C': None, 'D': 5}, {'A': 15, 'B': 30, 'C': 10, 'D': 15}]
>>> pprint( list(gen_dicts(database_input, database_keys)) )
[{'City': None,
  'Email': 'johndoe@test.com',
  'First Name': 'John',
  'Last Name': 'Doe'},
 {'City': None,
  'Email': 'jane@test.com',
  'First Name': 'Jane',
  'Last Name': None}]

It worked wonderfully, thank you a lot. I'll dig into the dict.fromkeys and the dict.cycle, it should help me a lot. Again, thank you. — Edu C., Jun 05 '16 at 07:06
`help(dict.fromkeys)` is really strait forward, but looking into [iterators](http://stackoverflow.com/questions/9884132/what-exactly-are-pythons-iterator-iterable-and-iteration-protocols) would help you more before looking into `itertools.cycle`, once you have a grasp on iteration `cycle` is just "repeat iteration over and over again". — Tadhg McDonald-Jensen, Jun 05 '16 at 07:19

score 1 · Answer 2 · answered Jun 05 '16 at 02:56

1

Here's a dirty way:

l1 = [
'A',
'B',
'C',
'D',
]

l2 = [
['A', 10],
['B', 20],
['D', 5],

['A', 15],
['B', 30],
['C', 10],
['D', 15],

['A', 8],
]

# Assuming elements in l2 are ordered, try to make groups
# of the same length of l1.
l_aux = l1[:]
l3 = [[]]
for x in l2:
    if x[0] in l_aux:
        l3[-1].append(x)
        l_aux.remove(x[0])
        continue
    for y in l_aux:
        l3[-1].append([y, 'WHATEVER'])
    l3.append([x])
    l_aux = l1[:]
    l_aux.remove(x[0])
for y in l_aux:
    l3[-1].append([y, 'WHATEVER'])
# Now, you have the elements you want grouped.
# Last step: sort and flat the list:
l3 = [y for x in l3 for y in sorted(x)]
print '\n'.join(str(x) for x in l3)
# ['A', 10]
# ['B', 20]
# ['C', 'WHATEVER']
# ['D', 5]
# ['A', 15]
# ['B', 30]
# ['C', 10]
# ['D', 15]
# ['A', 8]
# ['B', 'WHATEVER']
# ['C', 'WHATEVER']
# ['D', 'WHATEVER']

answered Jun 05 '16 at 02:56

feqwix

1,362
14
16

this also doesn't take into account order, the entries are only split when it sees a data pair that has been used before so the input `l2 = [['A', 10],['C', 5], ['B', 15],['D', 30] ]` would most likely represent two seperate entries but with your code it is only treated as one. – Tadhg McDonald-Jensen Jun 05 '16 at 03:48
Then because of this it can be thrown off sync, input like `l2 = [['A', 10],['C', 5], ['B', 15],['C', 30], ['A',1],['B',2],['C',3],['D',4] ]` produces interesting results but unlikely to be correct. – Tadhg McDonald-Jensen Jun 05 '16 at 03:53
Nevermind, the [OP just confirmed](http://stackoverflow.com/questions/37636607/join-lists-with-repeated-values/37637873#comment62757118_37636607) that every entry will contain the first key so there would be no off-sync mayhem with their use case. – Tadhg McDonald-Jensen Jun 05 '16 at 03:55

Moon Cheesez · Answer 3 · 2016-06-05T07:07:08.813

The problem here with how dictionaries store values. A dictionary will take your key, use the __hash__ function on it, then store that value. When it comes to strings, two strings with the same value will have the same output when __hash__ed. For example

>>> a = "foo"
>>> b = "foo"
>>> a == b
True
>>> a.__hash__()
-905768032644956145
>>> b.__hash__()
-905768032644956145

As you can see, they both have the same value when __hash__ed. So, when a dictionary is trying to store two of the same keys, it will override the previous value instead of creating a new key.

Looking at your first and second example, you can use a list of dictionaries instead (Assuming each value will start with "A" or "First Name"). So you could do something like this:

dc = []
for s in l2:
    if s[0] != "First Name":
        dc[-1][s[0]] = s[1]
    else:
        dc.append({s[0]: s[1]})

Then, to retrieve the "First Name" of the first person you entered from dc you could use this:

dc[0]["First Name"]

An extension of this is to store them as classes. Let's say we have a class called Person:

class Person(object):
    def __init__(self, personal_information):
        super(Person, self).__init__()
        self.first_name = personal_information["First Name"]
        if "Last Name" in personal_information.keys():
            self.last_name = personal_information["Last Name"]
        if "Email" in personal_information.keys():
            self.email = personal_information["Email"]
        if "City" in personal_information.keys():
            self.city = personal_information["City"]
    def __repr__(self):
        # Just to make things look clean
        return "Person("+self.first_name+")"

This would be able to store all our data just by passing a dictionary that is already stored in dc:

people = []

for s in dc:
    people.append(Person(s))

When you want to access the first person's first name:

>>> people
[Person(John), Person(Jane)]
>>> people[0].first_name
'John'

The types of data structures are up to you.

Thank you for your answer, now I get it why the values are unique and the code you posted worked very well - although it didn't show the keys with no values, you helped a lot! — Edu C., Jun 05 '16 at 07:03

Join lists with repeated values

3 Answers3