2

I have a list of numbers:

Data = [0,2,0,1,2,1,0,2,0,2,0,1,2,0,2,1,1,...]

And I have a list of tuples of two, which is all possible combinations of the individual numbers above:

Combinations = [(0,0),(0,1),(0,2),(1,0),(1,1),(1,2),(2,0),(2,1),(2,2)]

I want to try to find where each item in Combinations appears in Data and add the value after each occurrence to another list.

For example, for (0,2) I want to make a list [0,0,0,1] because those are the the values that fall immediately after (0,2) occurs in Data.

So far, I have:

any(Data[i:i+len(CurrentTuple)] == CurrentTuple for i in xrange(len(Data)-len(CurrentTuple)+1))

Where CurrentTuple is Combinations.pop(). The problem is that this only gives me a Boolean of whether the CurrentTuple occurs in Data. What I really need is the value after each occurrence in Data.

Does anyone have any ideas as to how this can be solved? Thanks!

  • If you need something faster (I imagine that if you are working with markov-chain this is the case) check also my answer below... Avoiding `append` makes list generation much faster. – Riccardo Petraglia Sep 22 '16 at 08:33

2 Answers2

0
sum([all(x) for x in (Data[i:i+len(CurrentTuple)] == CurrentTuple for i in xrange
(len(Data)-len(CurrentTuple)+1))])

What you did return a generator that produce the following list:

[array([False,  True], dtype=bool),
 array([ True, False], dtype=bool),
 array([False,  True], dtype=bool),
 ...
 array([False, False], dtype=bool),
 array([False, False], dtype=bool),
 array([False, False], dtype=bool),
 array([False,  True], dtype=bool)]

One of the array that you have in this list match the CurrentTuple only if both the bool in the array are True. The all return True only if all the elements of the list are True so the list generated by the [all(x) for x in ...] will contains True only if the twin of numbers match the CurrentTuple. A True is conted as 1 when you use sum. I hope it is clear.

If you want to compare only non-overlapping pairs:

[2,2,
 0,2,
 ...]

and keep the algorithm as general as possible, you can use the following:

sum([all(x) for x in (Data[i:i+len(CurrentTuple)] == CurrentTuple for i in xrange
(0,len(Data)-len(CurrentTuple)+1,len(CurrentTuple)))])

Despite much more cryptic, this code is much faster than any alternative that using append (look [Comparing list comprehensions and explicit loops (3 array generators faster than 1 for loop) to understand why).

Community
  • 1
  • 1
Riccardo Petraglia
  • 1,943
  • 1
  • 13
  • 25
0

You can use a dict to group the data to see where/if any comb lands in the original list zipping up pairs:

it1, it2 = iter(Data), iter(Data)
next(it2)

Combinations = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

d = {c: [] for c in Combinations}
ind = 2
for i, j in zip(it1, it2):
    if (i, j) in d and ind < len(Data):
        d[(i, j)].append(Data[ind])
    ind += 1
print(d)

Which would give you:

{(0, 1): [2, 2], (1, 2): [1, 0], (0, 0): [], (2, 1): [0, 1], (1, 1): [2], (2, 0): [1, 2, 1, 2], (2, 2): [], (1, 0): [2], (0, 2): [0, 0, 0, 1]}

You could also do it in reverse:

from collections import defaultdict

it1, it2 = iter(Data), iter(Data)
next(it2)
next_ele_dict = defaultdict(list)
data_iter = iter(Data[2:])
for ind, (i, j) in enumerate(zip(it1, it2)):
    if ind < len(Data) -2:
      next_ele_dict[(i, j)].append(next(data_iter))

def next_ele():
    for comb in set(Combinations):
        if comb in next_ele_dict:
           yield comb, next_ele_dict[comb]

print(list(next_ele()))

Which would give you:

 [((0, 1), [2, 2]), ((1, 2), [1, 0]), ((2, 1), [0, 1]), ((1, 1), [2]), ((2, 0), [1, 2, 1, 2]), ((1, 0), [2]), ((0, 2), [0, 0, 0, 1])]

Any approach is better than a pass over the Data list for every element in Combinations.

To work for arbitrary length tuples we just need to create the tuples based on the length:

from collections import defaultdict

n = 2

next_ele_dict = defaultdict(list)

def chunks(iterable, n):
    for i in range(len(iterable)-n):
        yield tuple(iterable[i:i+n])

data_iter = iter(Data[n:])
for tup in chunks(Data, n):
      next_ele_dict[tup].append(next(data_iter))

def next_ele():
    for comb in set(Combinations):
        if comb in next_ele_dict:
            yield comb, next_ele_dict[comb]


print(list(next_ele()))

You can apply it to whatever implementation you prefer, the logic will be the same as far as making the tuples goes.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • This one works only if you want to compare couples of numbers – Riccardo Petraglia Sep 21 '16 at 18:32
  • @RiccardoPetraglia *If you are only considering non-overlapping pairs, (0,2)*, it can easily be adapted for the other case but since the OP reckons only three cases of `(0, 2)` happens then they must not be ] – Padraic Cunningham Sep 21 '16 at 18:33
  • Yes, but what he wrote works also if you decide to compare "triples" of numbers – Riccardo Petraglia Sep 21 '16 at 18:34
  • @RiccardoPetraglia, where do you see that in the question? **I have a list of tuples of two** – Padraic Cunningham Sep 21 '16 at 18:35
  • @RiccardoPetraglia, the OP would probably want to first verify their expected output before we verify what is not there ;) – Padraic Cunningham Sep 21 '16 at 18:38
  • @PadraicCunningham Thank you! This is very helpful, however I did forget to mention in the question that I need this to work for tuples of any length... Is there a way to adapt: it1, it2 = iter(Data), iter(Data) next(it2) Combinations = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)] d = {c: [] for c in Combinations} ind = 2 for i, j in zip(it1, it2): if (i, j) in d and ind < len(Data): d[(i, j)].append(Data[ind]) ind += 1 print(d) to work for tuples of all lengths? Thanks again! – Santi Tellez Sep 21 '16 at 20:05
  • @SantiTellez, are the tuples always going to be the same length? – Padraic Cunningham Sep 21 '16 at 20:07
  • @PadraicCunningham Yes, the tuples will always be the same length. This code is to build a Markov Chain of nth order where the tuples are of n length, so they will always be the same. Thank you so much for helping me out – Santi Tellez Sep 21 '16 at 20:14
  • No worries, we can do it, I am just finishing off some work and then I will add it. Basically all we need to do is consider the the length of the tuples. – Padraic Cunningham Sep 21 '16 at 20:21
  • @SantiTellez, the last snippet should work for any length – Padraic Cunningham Sep 21 '16 at 20:34
  • @PadraicCunningham thank you so much! It worked perfectly – Santi Tellez Sep 21 '16 at 21:27