1

I have a dataframe that contains many rows row (order of 60000) with the three fields:

  1. shift: a string of 96 characters indicating the location or 'r' (no location; e.g. 'rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrraaabbbbbbbrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr')
  2. start: specified startingpoint (1 indexed)
  3. stop: specified stoppoint (1 indexed)

Now I want to have a mapping for both location 'a' and 'b'. This mapping shows on which point in time, which row contains a or b. We thus have:

mapping['a'] = [[]]*96 # list of length 96 with, initially, and empty list for the row-indexes. `
mapping['b'] = [[]]*96 # list of length 96 with, initially, and empty list for the row-indexes. `

for index, row in pd_shifts.iterrows():
    for t in range(row['start']-1,row['stop']):
        loc = row['shift'][t] # either 'a' or 'b'
        if loc != 'r': # 'r' can be ignored.
            mapping[loc][t].append(index)

I use the above for-loops to find loc on moment t, and append it to mapping[loc][t]. Seems like an easy job. However, each index is added for as many times the a or b is in the string. A snippet of the output:
1535,1535,1536,1536,1536,1536,1536,1537,1537,

What is happening here? Why is each index appended for as many times the a is in the shift?

Attempts

I have checked that each row has a unique index, and that each row is iterated only once.
Also, each start point and stop point (and point in between) are visited once ('print(t)').

Community
  • 1
  • 1
Robin Kramer-ten Have
  • 818
  • 2
  • 13
  • 34

0 Answers0