4

The goal is to find a generic method to solve the following task:

I have two python lists of the same length filled with zeros and ones:

detection = [0,0,1,0]     # only examples, they can be of any length
ground_truth = [0,1,0,0]  # and the ones can be at any indizes

and a integer number

offset = 1                # this number is also variable

The goal is to combine #offset elements in detection around elements equal to 1 and then combine the same index elements of ground_truth logical or, resulting the new lists:

detection = [0,1]
ground_truth = [0,1]

graphical explanation:

enter image description here

Background Info: The detection / ground truth values belong to a binary classification of a time series and The idea is to have a flexible evaluation that results in a TP if the detection fits the ground_truth is within a certain range of time steps (=offset).

Additional Example:

offset = 1
detection = [1,0,0,0,1,1,0]
ground_truth = [0,0,0,1,0,0,0]

would result to:

detection = [1,0,1]
ground_truth = [0,0,1]
gustavz
  • 2,964
  • 3
  • 25
  • 47
  • Why does the detection and ground_truth get truncated to 2 element lists from a 4 element one? – Milan Cermak Jun 16 '20 at 11:27
  • because the elements around values of 1 get combined / squashed. That is the exact task of the question – gustavz Jun 16 '20 at 11:32
  • How does it behave when detection is e.g. `[1, 0, 0, 1, 1, 0]`? – Milan Cermak Jun 16 '20 at 11:40
  • im not sure but in pandas you can use `rolling()` with size 3 - two works with rolling window which has 3 elements - and then you would have to check if middle element is `1` before running some code. Eventually you has to work with slice `[ i : i+3 ]` in `for`-loop which use `range(len(...))` – furas Jun 16 '20 at 12:22
  • in pandas you could use `shift(1)` and `shift(-1)` to put value form previous and next row in new column but in the same row - and then you can work with data in row – furas Jun 16 '20 at 12:26
  • BTW: do you mean `OR` between values in both lists or `OR` between values in second list and value `1` ? If second version then `anything OR 1` gives `anything` and there is no need to use `OR 1` – furas Jun 16 '20 at 12:52
  • each list should be regarded separately. maybe the background info i added answers your question. – gustavz Jun 16 '20 at 14:04

2 Answers2

0

My first idea is to use slice [i-offset:i+offset+1]

If lists have different lengths then you can get shorter length

shorter = min(len(detection), len(ground_truth))

To works with lists separatelly you have to first find indexes.

I use [offset:shorter-offset] because I assumed that you don't want to check if there is not enought elements on left or right (if there are less elements then offset).

indexes = [i for i, val in enumerate(detection[offset:shorter-offset], offset) if val == 1]

And now you can use indexes

for i in indexes:
    #item = detection[i-offset:i] + detection[i+1:i+1+offset]
    # or

    item = detection[i-offset:i+offset+1]
    item.pop(offset) # remove value in the middle

    print('   detection item:', item)

I don't know what you try to do with or logic - so I skip it.


Code - with offset=2

detection    = [0,0,1,0,1,1,0,1,0,1,1]   # longer
ground_truth = [0,1,0,0,0,0,1,0]

#detection    = [0,0,1,0,0,0,1,0,0]       # shorter
#ground_truth = [0,0,1,0,1,1,0,1,0,1,1] 

print('   detection:', detection)
print('ground_truth:', ground_truth)

offset = 2
shorter = min(len(detection), len(ground_truth))

indexes = [i for i, val in enumerate(detection[offset:shorter-offset], offset) if val == 1]
print('indexes:', indexes)

for i in indexes:
    #item = detection[i-offset:i] + detection[i+1:i+1+offset]
    # or

    item = detection[i-offset:i+offset+1]
    item.pop(offset) # remove value in the middle

    print('   detection item:', item)

for i in indexes:
    #item = ground_truth[i-offset:i] + ground_truth[i+1:i+1+offset]
    # or

    item = ground_truth[i-offset:i+offset+1]
    item.pop(offset) # remove value in the middle

    print('ground_truth item:', item)

Result:

   detection: [0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1]
ground_truth: [0, 1, 0, 0, 0, 0, 1, 0]
indexes: [2, 4, 5]
   detection item: [0, 0, 0, 1]
   detection item: [1, 0, 1, 0]
   detection item: [0, 1, 0, 1]
ground_truth item: [0, 1, 0, 0]
ground_truth item: [0, 0, 0, 1]
ground_truth item: [0, 0, 1, 0]

Second idea is to use shift() to move value from previous/next row to the same row but to new column. But with new information I think it creates too many new columns so I removed it.


I was wondering if it could be done with rolling(window=3) but I couldn't create solution.


Doc: shift, apply, rolling

furas
  • 134,197
  • 12
  • 106
  • 148
  • both solutions only works for the exact same lists, the goal is to find a method to do this generic for lists of any length and with ones at any indezies. Also the offset is variable! – gustavz Jun 16 '20 at 14:18
  • 1
    you could write it at start. And you could create example data with different sizes. Create more examples to better describe problem. As for offset `[ i : i + (2*offset) + 1 ]` And I thing both version can give more result if there are more `1` - so only problem can be lists size. But probably `min(len(detection), len(ground_truth))` could resolve it. – furas Jun 16 '20 at 14:30
  • if you want to work with lists separatelly then first code should only find indexes for `1` in first list - and later use this indexes for list. I still don't understand which elements you use in `OR` logic. – furas Jun 16 '20 at 14:36
  • yes exactly, finding the `1` indizes is the first task! the offset defines a new range around those indizes, and those values should be squashed / combined to `1` in the `detection` list and the same range of indizes should be combined with a logical `OR` in the ground_truth` list – gustavz Jun 16 '20 at 14:39
  • Hi thanks for the effort. But the lists are always of same length, only this length that both have, is variable. Maybe the background info helps: Its the ground truth and the detection of a binary time series classification, so the ones and zeros are "True" / "False" Outcomes of a machine learning model. – gustavz Jun 17 '20 at 06:19
  • code should work even if you use lists with the same lengths. And you can use longer or shorter list then I used as example - it also should works. – furas Jun 17 '20 at 06:31
  • The code does not solve the problem. Check the additional example i added. Furthermore you should also be able to handle ones at index 0 or at the end of the list. – gustavz Jun 17 '20 at 13:02
  • I wrote in answer that I assumed that index is skiped because there is no data on the left which you ccould use. And I wrote that I don't understand your `"or" logic`. You have to better describe how you use `or` logic and what to do for index `0`. Simply: you didn't add details so I can't create expected code. Current code only gets `[i-offset: i+offset]` because I don't know how you calculate value with `or` logic. – furas Jun 17 '20 at 13:58
0

I found the ultimate solution. sub questions that solved it:

Code:

# Create Mask from Detection and Offset
w = offset*2 +1
mask = np.convolve(detection, np.ones(w), mode='same').clip(0,1).astype(int)

# Create Soft Detection
soft_detection = mask[~((np.diff(mask,prepend=False)==0) & mask==1)].tolist()

# Create Soft Ground Truth
idx = np.flatnonzero(np.r_[True,np.diff(mask)!=0])
soft_ground_truth = np.bitwise_or.reduceat(ground_truth, idx).tolist()
gustavz
  • 2,964
  • 3
  • 25
  • 47