2

Okay this is tricky. I have a pandas dataframe and I am dealing with machine log data. I have an index in the data, but this dataframe has various jobs in it. I wanted to be able to give those individual jobs an index of their own, so that i could compare them with each other. So I want another column with an index beginning with zero, which goes till the end of the job and then resets to zero for the new job. Or do i do this line by line?

user3591675
  • 399
  • 1
  • 4
  • 15
  • Please look at http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples and learn how to ask a good pandas question. You need to show your data and your expected output. We can't construct examples from paragraphs of explanation. – cs95 Sep 08 '17 at 07:07

1 Answers1

4

I think you need set_index with cumcount for count categories:

df = df.set_index(df.groupby('Job Columns').cumcount(), append=True)

Sample:

np.random.seed(456)
df = pd.DataFrame({'Jobs':np.random.choice(['a','b','c'], size=10)})

#solution with sorting
df1 = df.sort_values('Jobs').reset_index(drop=True)
df1 = df1.set_index(df1.groupby('Jobs').cumcount(), append=True)
print (df1)
    Jobs
0 0    a
1 1    a
2 2    a
3 0    b
4 1    b
5 2    b
6 3    b
7 0    c
8 1    c
9 2    c

#solution with no sorting
df2 = df.set_index(df.groupby('Jobs').cumcount(), append=True)
print (df2)
    Jobs
0 0    b
1 1    b
2 0    c
3 0    a
4 1    c
5 2    c
6 1    a
7 2    b
8 2    a
9 3    b
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252