-1

I have a dataframe like this

Date A B Stat
1-jan M a start
1-jan K 0 0
1-jan T b start
1-jan N a end
1-jan W 0 0
1-jan R b end

Actually, a and b represents activities that starts with start and ends with end value of Stat column. The Act column represents activity number. I want to create a new dataframe based on B and Stat columns so that my output will be:

Date A B Stat Act
1-jan M a start 1
1-jan K 0 0 1
1-jan T b start 1
1-jan N a end 1
1-jan T b start 2
1-jan N a end 2
1-jan W 0 0 2
1-jan R b end 2

The Act column represents activity number. Any help will be really appreciated.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
  • How did your b start and a end switch places in the output? – kztd Mar 27 '19 at 03:14
  • @kztd thank you. But Actually I need a loop that starts from first row, stores its index with value of column B (e.g a) and Stat (e.g. start). Then checks for same value in Column B in the next rows (i.e a) but with opposite Stat value (i.e. end). So when found then extract the set of rows and name it Act 1. Similarly, take second row and check its similar value in column B with opposite value in column Stat then save it with Act 2 and so on. – Rafi Ullah Khan Mar 27 '19 at 07:29

1 Answers1

0

Sounds like you're looking for cumsum on 'start' being in the Stat column. pretty much the same as this question

cols = ["Date", "A", "B", "Stat"]
strs = ["1-jan M a start",
"1-jan K 0 0",
"1-jan T b start",
"1-jan N a end",
"1-jan W 0 0",
"1-jan R b end"
       ]

list = []
for str in strs:
    items = str.split(" ")
    list.append(items)
print(list)
df = pd.DataFrame(list, columns=cols)
df["StartCount"] = (df['Stat'] == 'start').cumsum()
print(df)

>

    Date  A  B   Stat  StartCount
0  1-jan  M  a  start           1
1  1-jan  K  0      0           1
2  1-jan  T  b  start           2
3  1-jan  N  a    end           2
4  1-jan  W  0      0           2
5  1-jan  R  b    end           2
kztd
  • 3,121
  • 1
  • 20
  • 18
  • Thank you. but sorry the problem still not solved. I want value 1 for the set of rows whose first and last rows have same column **B** values but opposite column **Stat** values. – Rafi Ullah Khan Mar 27 '19 at 07:35