How can I merge lists with alternating observations into a single one

Question

I have many lists exactly like the ones below, provided by a weather station.

However, how can I "merge" the two daily observations into a single one? (the records available on the first set of daily observations are never present on the second set).

['82294', '04/03/2002', '0000', '', '30.9', '', '', '', '26.1', '93', '1.554', '']
['82294', '04/03/2002', '1200', '24', '', '22', '', '', '', '', '', '']
['82294', '05/03/2002', '0000', '', '29.9', '', '', '', '25.62', '92.5', '0.863333', '']
['82294', '05/03/2002', '1200', '11', '', '23.2', '', '', '', '', '', '']
['82294', '06/03/2002', '0000', '', '31.6', '', '', '', '27.12', '87.5', '1.381333', '']
['82294', '06/03/2002', '1200', '0.2', '', '22.6', '', '', '', '', '', '']
['82294', '07/03/2002', '0000', '', '32.2', '', '', '', '27.6', '90.75', '1.899333', '']
['82294', '07/03/2002', '1200', '2', '', '24.6', '', '', '', '', '', '']
['82294', '08/03/2002', '0000', '', '29.3', '', '', '', '25.66', '95.25', '1.036', '']
['82294', '08/03/2002', '1200', '21', '', '24.4', '', '', '', '', '', '']
['82294', '09/03/2002', '0000', '', '31.5', '', '', '', '26.26', '95.75', '1.899333', '']
['82294', '09/03/2002', '1200', '23', '', '22.8', '', '', '', '', '', '']
['82294', '10/03/2002', '0000', '', '31.7', '', '', '', '26.94', '90.5', '2.072', '']

How would you define "merge" in this case? An example would be helpful indeed. — frogatto, Sep 06 '16 at 17:26
There are infinite ways of merging these lists. You should provide an outcome — rafaelc, Sep 06 '16 at 17:30

score 6 · Accepted Answer · edited May 23 '17 at 12:32

You can use the pairwise iteration to group the pairs, then zip() the groups item by item and use or to choose one of the non-empty values:

[[x or y for x, y in zip(item1, item2)] 
 for item1, item2 in zip(data[0::2], data[1::2])]

where data is your input list of lists.

Produces:

[
    ['82294', '04/03/2002', '0000', '24', '30.9', '22', '', '', '26.1', '93', '1.554', ''], 
    ['82294', '05/03/2002', '0000', '11', '29.9', '23.2', '', '', '25.62', '92.5', '0.863333', ''], 
    ['82294', '06/03/2002', '0000', '0.2', '31.6', '22.6', '', '', '27.12', '87.5', '1.381333', ''], 
    ['82294', '07/03/2002', '0000', '2', '32.2', '24.6', '', '', '27.6', '90.75', '1.899333', ''], 
    ['82294', '08/03/2002', '0000', '21', '29.3', '24.4', '', '', '25.66', '95.25', '1.036', ''], 
    ['82294', '09/03/2002', '0000', '23', '31.5', '22.8', '', '', '26.26', '95.75', '1.899333', '']
]

_{You may additionally think of merging 0000 and 1200 in a better way cause now 0000 would be chosen.}

score 1 · Answer 2 · answered Sep 06 '16 at 17:49

You can also use pandas and its groupby() + apply():

import pandas as pd

df = pd.DataFrame(data, columns=['id', 'date', 'time', 'value1', 'value2', 'value3', 'value4', 'value5', 'value6', 'value7', 'value8', 'value9'])
df = df.groupby('date').apply(lambda x: x.max())

print(df.values.tolist())

Prints:

[
    ['82294', '04/03/2002', '1200', '24', '30.9', '22', '', '', '26.1', '93', '1.554', ''], 
    ['82294', '05/03/2002', '1200', '11', '29.9', '23.2', '', '', '25.62', '92.5', '0.863333', ''], 
    ['82294', '06/03/2002', '1200', '0.2', '31.6', '22.6', '', '', '27.12', '87.5', '1.381333', ''], 
    ['82294', '07/03/2002', '1200', '2', '32.2', '24.6', '', '', '27.6', '90.75', '1.899333', ''], 
    ['82294', '08/03/2002', '1200', '21', '29.3', '24.4', '', '', '25.66', '95.25', '1.036', ''], 
    ['82294', '09/03/2002', '1200', '23', '31.5', '22.8', '', '', '26.26', '95.75', '1.899333', ''], 
    ['82294', '10/03/2002', '0000', '', '31.7', '', '', '', '26.94', '90.5', '2.072', '']
]

Here, Series.max() works for us to merge the grouped items - maximum of an empty string and a non-empty string would always be a non-empty string. I though feel there should be a better (more appropriate, so to say) merging function.

DimKoim · Answer 3 · 2016-09-06T17:54:50.823

-1

Maybe something like that:

list_1=['82294', '04/03/2002', '0000', '', '30.9', '', '', '', '26.1', '93', '1.554', '']
list_2=['82294', '04/03/2002', '1200', '24', '', '22', '', '', '', '', '', '']
merged_list= list(set(list_1+list_2))

Update

merged_list = list([x for x in list_1 if x ])
merged_list.extend(x for x in list_2 if x)

edited Sep 06 '16 at 17:54

answered Sep 06 '16 at 17:32

DimKoim

1,024
6
20
33

This would lose data; it ignores the ordering and legitimate duplicates during the merge. – chepner Sep 06 '16 at 17:36
@chepner Agree about the latter but no one asked about the former. – DimKoim Sep 06 '16 at 17:51
The ordering is implied by the fact that paired measurements are never both non-empty. – chepner Sep 06 '16 at 17:55

How can I merge lists with alternating observations into a single one

3 Answers3