3

I am trying to, as the title suggests, loop through a list of data to create pairs and continue doing so until no more pairs can be made. For context, I am working with floats, and want to go through all elements in the list, and then compare them to every other element to see if they are within 0.03 above or below the current item.

If they are, then I am trying to create a tuple within the list of them which is made of a high and a low. I then want to continue looping through the entire list until all "ranges" that group together are grouped.

I've tried to frankenstein some code together, using what python knowledge I have and by breaking this up into smaller issues that can be individually looked at but haven't got anything to work so far. The nearest I have come is the incomplete code below:

s = []
t = ()
for item in items:
    for v in items:
      if v < (item + 0.03) and v > (item - 0.03):
         t = (item, v)
         s.append(t)

Though I am not sure where to insert the loop, or how to condition it and I am not yet replacing matching items with a tuple, this is just the rough start.

What the outcome should be is something like this (I'm not picky on the specifics as long as I can create all the groups):

Initial List: 1.05, 1.07, 1.18, 1.19, 1.22, 1.26, 1.30, 1.32

On the first pass this creates: [(1.07, 1.05), (1.19, 1.18), 1.22, 1.26, (1.30, 1.32)]

At this point paired numbers are in tuples where the first position is the high of the range, the second is the low of the range.

Next Pass (and last for this short example):

[(1.07, 1.05), (1.22, 1.18), 1.22, 1.26, (1.30, 1.32)]

Note that the previous high for the second tuple was replaced by the new high that fell within that range of 0.03, 1.22

This is just how I imagine it unfolding, but as long as I achieve that end result of finding all the ranges of pairs then I'll be successful.

To explain the why: I am looking for "zones" on a graph (or rather the data points of a line graph). I define similar points as 0.03 away from each other, and when it is done I'll be able to draw horizontal lines on the graph to show zones of activity. The zone doesn't have to have a set height, it can be large or small, as long as every point is 0.03 distance from the nearest neighbors. Also, if we had (1.01, 1.03) and (1.05, 1.08) the list would become (1.01, 1.08) so I know that this range or group goes from 1.01 to 1.08. (low and high respectively). 1.03 and 1.05 no longer matter as they are within that zone.

nghs
  • 129
  • 1
  • 15
  • 1
    What should the end list look like, and what does your current code produce? – Patrick Haugh Oct 29 '17 at 20:45
  • 2
    Just to make it clearer would you add a list of values as example and a list of tuples you want to obtain out of those values. – Marco Oct 29 '17 at 20:46
  • 1
    This task can be simplified by first sorting the data. – PM 2Ring Oct 29 '17 at 20:50
  • Thanks, PatrickHaugh & Marco, I've expanded on how I think it should be working out. – nghs Oct 29 '17 at 20:54
  • @PM2Ring, so sort the data from low to high first, or something along those lines? I'm not against that at all, but what would that then look like? Maybe `if current < (next + 0.03) or current > (next - 0.03)`? From my low experience, I think I'd still need to do the same thing (compare, and then loop back through until no pairs can be made) though right? – nghs Oct 29 '17 at 20:56
  • @PatrickHaugh My current code produces a massive list of basically `item[i], v[i]` from each true iteration so `item[0], v[0], item[0], v[1], item[0], v[3], item[1], v[1]`, pretty much pairing each item to each item that is within that range of itself, including pairing with itself. I'm wondering if I can solve this repetition by just removing both `item[i]` and `v[i]` whenever a match happens and they are put into a tuple in the list? – nghs Oct 29 '17 at 21:11
  • It's not completely clear what you want this algorithm to do. Why is `(1.22, 1.18)` valid? 1.22 - 1.18 = 0.04, which is greater than your 0.03 limit. It would help if you added a few more examples of typical input and desired output. – PM 2Ring Oct 30 '17 at 14:58
  • @PM2Ring because in the next cycle it is `[1.19, 1.18]` and now `1.22` is `0.03` away from the range (`1.19`) – nghs Oct 30 '17 at 15:43
  • Does that make sense at all? I'm basically expanding outward in each cycle, by a range of `0.03`, until no more expansions can be made. – nghs Oct 30 '17 at 15:44
  • I expanded a little more. But to explain the `why`: I am looking for "zones" on a graph (or rather the data points of a line graph). I define similar points as `0.03` away from each other, and when it is done I'll be able to draw horizontal lines on the graph to show zones of activity. The zone doesn't have to have a set height, it can be large or small, as long as every point is 0.03 distance from the nearest neighbors. Also, if we had `(1.01, 1.03)` and `(1.05, 1.08)` the list would become `(1.01, 1.08)` so I know that this range or group goes from 1.01 to 1.08. (low and high respectively) – nghs Oct 30 '17 at 16:43
  • in the example above 1.01 would be the bottom of that zone, and 1.08 would be the top, we don't care about 1.03 and 1.05 anymore, because they are in that range of the zone, so they're already covered. – nghs Oct 30 '17 at 16:45
  • I was going to extract them into their own list and then append them as single unit tuples afterward, but that's because I didn't know we could skip the middleman and just create a one item tuple during the process. – nghs Oct 30 '17 at 16:55
  • FYI I had to change `or` to `and` in the if statement, didn't make sense with or, as it should be between a range. – nghs Oct 30 '17 at 17:05

2 Answers2

1

I think you just want combination that has difference of < 0.03:

from itertools import combinations
a = [1.05, 1.07, 1.18, 1.19, 1.22, 1.26, 1.30, 1.32]
result = [i for i in combinations(a, 2) if abs(i[0]-i[1]) < 0.03]
halfer
  • 19,824
  • 17
  • 99
  • 186
zipa
  • 27,316
  • 6
  • 40
  • 58
  • Your result is rather different from the OP's: it doesn't have `(1.22, 1.18)` – PM 2Ring Oct 30 '17 at 16:57
  • That's certainly a cleaner way than I've been doing to make the pairs, but I also want to loop through it, combining all lists within a zone together until no more pairs can be made. I expanded on the question just a moment ago per the discussion in the comments. Thanks for your help with this! – nghs Oct 30 '17 at 16:58
  • @PM2Ring yeah, if I understand this solution correctly it just covers the first step, but it is a cleaner way to do the first step then I've been doing, so while not the 'correct answer', it does help! I'll upvote when I unlock the privilege. – nghs Oct 30 '17 at 17:00
  • I know, but his loop seems to aim for this result :) – zipa Oct 30 '17 at 17:00
  • 1
    @nghs `combinations` is a very handy function & I often use it, but it's not a good idea to use it for this particular task. We only need to compare adjacent items in the list, not every pair. If the list has 101 items, then zipa's code generates and tests 101*100/2=5050 pairs, but we really only need to perform 100 tests. – PM 2Ring Oct 30 '17 at 18:43
1

I believe the code below does what you want. ;)

We first run through the sorted data, testing if the current items is within the limit of the previous item. If it is, we add it to the current zone group, otherwise it becomes the start of a new group. We then go over those groups condensing them down to tuples of two or one items.

We have to be careful when performing the tests, due to the possibility of floating-point errors. Eg, 1.22 - 1.19 results in 0.030000000000000027 which is greater than 0.03. To deal with that I've increased the limit slightly to 0.03001. For further details on this important topic, please see Is floating point math broken?.

My code tests make_zones with the data given in the question and also with some randomly generated data.

from random import seed, sample

seed(42)

# Adjust limit to handle floating-point errors
lim = 0.03001

def make_zones(data):
    # Collect values into groups
    groups = []
    t = data[:1]
    for u in data[1:]:
        if u - t[-1] <= lim:
            t.append(u)
        else:
            groups.append(t)
            t = [u]
    if t:
        groups.append(t)

    print('\n1st pass', groups)

    # Convert lists into 1 or 2 item tuples
    return [(u[-1], u[0]) if len(u)>1 else tuple(u) for u in groups]

# Tests

data = [1.05, 1.07, 1.18, 1.19, 1.22, 1.26, 1.30, 1.32]
print('Data', data)
print('Zones', make_zones(data), '\n')

# Make some random data
num = 20
data = [u / 100 for u in sample(range(100, 150), num)]
data.sort()
print('Data', data)
print('Zones', make_zones(data))

output

Data [1.05, 1.07, 1.18, 1.19, 1.22, 1.26, 1.3, 1.32]

1st pass [[1.05, 1.07], [1.18, 1.19, 1.22], [1.26], [1.3, 1.32]]
Zones [(1.07, 1.05), (1.22, 1.18), (1.26,), (1.32, 1.3)] 

Data [1.01, 1.02, 1.05, 1.06, 1.07, 1.08, 1.13, 1.14, 1.15, 1.17, 1.27, 1.32, 1.34, 1.36, 1.37, 1.4, 1.44, 1.46, 1.47, 1.49]

1st pass [[1.01, 1.02, 1.05, 1.06, 1.07, 1.08], [1.13, 1.14, 1.15, 1.17], [1.27], [1.32, 1.34, 1.36, 1.37, 1.4], [1.44, 1.46, 1.47, 1.49]]
Zones [(1.08, 1.01), (1.17, 1.13), (1.27,), (1.4, 1.32), (1.49, 1.44)]
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • Thanks, I'll take time to look through this and test it out after work. I appreciate the help! – nghs Oct 30 '17 at 19:26
  • with my initial data set this worked perfectly, but after expanding the range of information I am discovering that there are many tuples being created that overlap. I believe this is because we aren't looping through until no more pairs can be made. Any suggestions? – nghs Nov 01 '17 at 19:54
  • @nghs That's a bit odd. My strategy should group together all the data in a zone on the first pass. But you _do_ have to pass it sorted data, as shown in my example with the random data. – PM 2Ring Nov 01 '17 at 20:02
  • @nghs If your data _was_ sorted, then please paste it into your question & I'll have a look at it. BTW, Python's `sort` function is very good, and it's optimized to take advantage of sub-sequences in the data that are already sorted. – PM 2Ring Nov 01 '17 at 20:11
  • You're right, i switched from a sorted data set to an unsorted data set and it appears to be fine now that I've sorted it. – nghs Nov 03 '17 at 01:58