I have a dataframe df
with the following ids (in Col
).
The last occurrence of A/B/C represents the start, and the last occurrence of X is the end. I should ignore any other A,B,C between start and end (e.g. rows 8 and 9).
I have to find start and end records from this data and assign a number to each of these occurrences. The column count
is my desired output:
Col ID
P
Q
A
A
A 1
Q 1
Q 1
B 1
C 1
S 1
S 1
X 1
X 1
X 1
Q
Q
R
R
C
C 2
D 2
E 2
B 2
K 2
D 2
E 2
E 2
X 2
X 2
This code:
lc1 = df.index[df.Col.eq('A') & df.Col.ne(df.Col.shift(-1))]
would give me an array of all the last occurrences of Index values of 'A', in this case [5]
.
lc1 = df.index[df.Col.eq('C') & df.Col.ne(df.Col.shift(-1))] # [20]
lc2 = df.index[df.Col.eq('X') & df.Col.ne(df.Col.shift(-1))] # [14,29]
I would use iloc
to print the count values:
df.iloc[5:14]['count'] = 1
df.iloc[20:29]['count'] = 2
How can I find the indices of A/B/C together and print the count values of each start and end occurrence?