-1

I have 20 columns in a dataframe. I list 4 of them here as example:

is_guarantee: 0 or 1
hotel_star: 0, 1, 2, 3, 4, 5
order_status: 40, 60, 80
journey (Label): 0, 1, 2

    is_guarantee  hotel_star  order_status  journey
0              0           5            60        0
1              1           5            60        0
2              1           5            60        0
3              0           5            60        1
4              0           4            40        0
5              0           4            40        1
6              0           4            40        1
7              0           3            60        0
8              0           2            60        0
9              1           5            60        0
10             0           2            60        0
11             0           2            60        0

Click to View Image

But the system need to input the occurrence matrix like the following format to function:

Click to View Image

Can any body help?

df1 = pd.DataFrame(index=range(0,20))
df1['is_guarantee'] = np.random.choice([0,1], df1.shape[0])
df1['hotel_star'] = np.random.choice([0,1,2,3,4,5], df1.shape[0])
df1['order_status'] = np.random.choice([40,60,80], df1.shape[0])
df1['journey '] = np.random.choice([0,1,2], df1.shape[0])
DataHolic
  • 55
  • 6
  • I want to see your data edited in the question as _text_. I can't copy and paste a picture into my terminal, and I don't want to type it from scratch. Please make everyone's life easy, post your data and expected output in your question as text. No data = no help. – cs95 Dec 18 '17 at 11:26
  • @jezrael... No one is here to persecute you, least of all me. I told you that I respect your knowledgeability. Unfortunately, sometimes you do things that could be considered unhealthy for the site. That is not my opinion. Anyway I have reopened the question, enjoy. – cs95 Dec 18 '17 at 11:41
  • I should also mention that encouraging low quality questions by answering them counts as the very same unhealthy habits I mentioned earlier. – cs95 Dec 18 '17 at 11:48
  • 1
    @cᴏʟᴅsᴘᴇᴇᴅ - I aggre with you. I answer this kind of question if are interesting only, else not. – jezrael Dec 18 '17 at 11:51
  • @jezrael, okay, as long as you understand, I have no complaints. :-) – cs95 Dec 18 '17 at 11:52
  • Sorry. See if this okay? df1 = pd.DataFrame(index=range(0,20)) df1['is_guarantee'] = np.random.choice([0,1], df1.shape[0]) df1['hotel_star'] = np.random.choice([0,1,2,3,4,5], df1.shape[0]) df1['order_status'] = np.random.choice([40,60,80], df1.shape[0]) df1['journey '] = np.random.choice([0,1,2], df1.shape[0]) – DataHolic Dec 19 '17 at 02:23

1 Answers1

1

I think you need:

  • reshape by melt and get counts by groupby with size, reshape by unstack
  • then divide sum per rows and join MultiIndex to index:

df = (df.melt('journey')
       .astype(str)
       .groupby(['variable', 'journey','value'])
       .size()
       .unstack(1, fill_value=0))

df = (df.div(df.sum(1), axis=0)
        .mul(100)
        .add_prefix('journey_')
        .set_index(df.index.map(' = '.join))
        .rename_axis(None, 1))

print (df)

                    journey_0  journey_1
hotel_star = 2     100.000000   0.000000
hotel_star = 3     100.000000   0.000000
hotel_star = 4      33.333333  66.666667
hotel_star = 5      80.000000  20.000000
is_guarantee = 0    66.666667  33.333333
is_guarantee = 1   100.000000   0.000000
order_status = 40   33.333333  66.666667
order_status = 60   88.888889  11.111111
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252