I have a dataframe that contains many rows row
(order of 60000) with the three fields:
- shift: a string of 96 characters indicating the location or
'r'
(no location; e.g.'rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrraaabbbbbbbrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr'
) - start: specified startingpoint (1 indexed)
- stop: specified stoppoint (1 indexed)
Now I want to have a mapping for both location 'a'
and 'b'
. This mapping shows on which point in time, which row
contains a
or b
. We thus have:
mapping['a'] = [[]]*96 # list of length 96 with, initially, and empty list for the row-indexes. `
mapping['b'] = [[]]*96 # list of length 96 with, initially, and empty list for the row-indexes. `
for index, row in pd_shifts.iterrows():
for t in range(row['start']-1,row['stop']):
loc = row['shift'][t] # either 'a' or 'b'
if loc != 'r': # 'r' can be ignored.
mapping[loc][t].append(index)
I use the above for-loops to find loc
on moment t
, and append it to mapping[loc][t]
. Seems like an easy job. However, each index is added for as many times the a
or b
is in the string. A snippet of the output:
1535,1535,1536,1536,1536,1536,1536,1537,1537,
What is happening here? Why is each index appended for as many times the a
is in the shift?
Attempts
I have checked that each row has a unique index, and that each row is iterated only once.
Also, each start point and stop point (and point in between) are visited once ('print(t)').