Okay this is tricky. I have a pandas dataframe and I am dealing with machine log data. I have an index in the data, but this dataframe has various jobs in it. I wanted to be able to give those individual jobs an index of their own, so that i could compare them with each other. So I want another column with an index beginning with zero, which goes till the end of the job and then resets to zero for the new job. Or do i do this line by line?
Asked
Active
Viewed 2,665 times
2
-
Please look at http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples and learn how to ask a good pandas question. You need to show your data and your expected output. We can't construct examples from paragraphs of explanation. – cs95 Sep 08 '17 at 07:07
1 Answers
4
I think you need set_index
with cumcount
for count categories:
df = df.set_index(df.groupby('Job Columns').cumcount(), append=True)
Sample:
np.random.seed(456)
df = pd.DataFrame({'Jobs':np.random.choice(['a','b','c'], size=10)})
#solution with sorting
df1 = df.sort_values('Jobs').reset_index(drop=True)
df1 = df1.set_index(df1.groupby('Jobs').cumcount(), append=True)
print (df1)
Jobs
0 0 a
1 1 a
2 2 a
3 0 b
4 1 b
5 2 b
6 3 b
7 0 c
8 1 c
9 2 c
#solution with no sorting
df2 = df.set_index(df.groupby('Jobs').cumcount(), append=True)
print (df2)
Jobs
0 0 b
1 1 b
2 0 c
3 0 a
4 1 c
5 2 c
6 1 a
7 2 b
8 2 a
9 3 b

jezrael
- 822,522
- 95
- 1,334
- 1,252
-
That solved the problem. You are a pandas genius, I think. Thanks a lot! – user3591675 Sep 08 '17 at 07:17