List intersection and partial string matching in Python

Question

So i have 2 lists the first comes from my dataset and contains dates-times in the format 'yyyy-mm-dd hh:mm', named times. Example:

'2010-01-01 00:00', '2010-01-01 00:15', '2010-01-01 00:30', ...,

The other is a list of all the unique year month combinations, named year_and_month. Example:

'2010-01', '2010-02', '2010-03', '2010-04',

So i try to extract all the indices of a year-month combination in the original dataset. I do that using the worst ways (new in python), namely

each_member_indices = []
for i in range(len(year_and_month)):
    item_ind = []
    for j in range(times.shape[0]):
        if year_and_month[i] in times[j]:
            item_ind.append(j)

each_member_indices.append(item_ind)

Now, this is a nuke for taking so much time to work. So i wanted to optimise it a bit and thus i was looking at some implementations such as Find intersection of two lists? and Python: Intersection of full string from list with partial string the problem being that

res_1 = [val for val in year_and_month if val in times]

yields an empty list, whereas

res_1 = [val for val in year_and_month if val in times[0]]

yields the first member at least.

Any thoughts?

EDIT:

I am only in need of the indices of the elements from the original dataset named times corresponding the unique year-month pairs of the year_and_month list. So as requested a sample output would be

[[0, 1, 2, 3,...],[925, 926, ...],...]

The first sublist contains the indices for the pair 2010-January, the second for the 2010-February and so on.

You are right! As i was looking at the solutions i discovered that i get what i want through the for loop but the list comprehension is not serving the same purpose. To answer your question i m a getting a list of lists namely `each_member_indices` is `[[0,1,2,..], [924, 925,...],...]` each sublist corresponding to the unique year month pair, so for example the first sublist is all the indices for the January 2010 period. — Kots, Jun 30 '17 at 11:31

score 1 · Answer 1 · answered Jun 30 '17 at 10:04

1

Maybe try using an any?

[val for val in year_and_month if any(val in t for t in times)]

answered Jun 30 '17 at 10:04

Wesley Bowman

1,366
16
35

**Note** that I didn't try your original code, and not sure what output you are looking for – Wesley Bowman Jun 30 '17 at 10:05
1

Quite some caveats ;) Maybe a comment clarifying the question would be better – Chris_Rands Jun 30 '17 at 10:07

score 0 · Answer 2 · answered Jun 30 '17 at 10:04

Why not make a new structure with a dictionary and order them by year_and_month?

result = {}
for i, v in enumerate(times):
    result.setdefault(v[:7], []).append(i)
for i in year_and_month:
     print(i, result[i]) #will print the year_month with all the indices of that year_month

Eugene Yarmash · Accepted Answer · 2017-06-30T10:24:40.247

0

To do that in linear time, you could build a lookup dictionary mapping year and month combinations to indices. You can also use collections.defaultdict to make it a bit easier:

from collections import defaultdict

d = defaultdict(list)
for i, v in enumerate(times):
    d[v[:7]].append(i)

Then you can create the result list with a list comprehension:

result = [d[x] for x in year_and_month]

Demo:

>>> from collections import defaultdict
>>> times = ['2010-01-01 00:00', '2010-01-01 00:15', '2010-02-01 00:30', '2010-03-01 00:00']
>>> year_and_month = ['2010-01', '2010-02', '2010-03', '2010-04']
>>> d = defaultdict(list)
>>> for i, v in enumerate(times):
...     d[v[:7]].append(i)
...     
>>> dict(d)
{'2010-01': [0, 1], '2010-02': [2], '2010-03': [3]}
>>> [d[x] for x in year_and_month]
[[0, 1], [2], [3], []]

edited Jun 30 '17 at 10:24

answered Jun 30 '17 at 10:07

Eugene Yarmash

142,882
41
325
378

So if i want to extract '2010-01' i should be able to by writing `d['2010-01']`. However when i do `result = [d[x] for x in year_and_month]` this gives me a list where `len(result) == len(times)`. However i would prefer a list `result` having length the same as the unique year-month combinations, i.e the same as the result in your demo. Is this maybe a problem coming from the fact that i m using python 3? – Kots Jun 30 '17 at 12:16
Perhaps, each element in `times` has unique year-month then? A [list comprehension](https://docs.python.org/3.6/tutorial/datastructures.html#list-comprehensions) creates a new list of the same size as the input one. – Eugene Yarmash Jun 30 '17 at 13:01

score 0 · Answer 4 · answered Jun 30 '17 at 10:08

0

alright, this gives the common elements:

ls = str(times)
r = [x for x in year_and_month if (x in ls)]
print r

answered Jun 30 '17 at 10:08

Yasin Yousif

969
7
23

List intersection and partial string matching in Python

4 Answers4

Linked