1

I have a dataframe that discretize the customers into different Q's, which looks like:

    CustomerID_num  Q1  Q2  Q3  Q4  Q5  Country
0   12346           1   0   0   0   0   United Kingdom
2   12347           0   0   0   0   1   Iceland
9   12348           0   1   0   0   0   Finland
13  12349           0   0   0   0   1   Italy
14  12350           0   1   0   0   0   Norway

What I want to do is adding a new column, Q, to the dataframe which shows which sector this customer is in, so it looks like:

    CustomerID_num  Q1  Q2  Q3  Q4  Q5  Q    Country
0   12346           1   0   0   0   0   1    United Kingdom
2   12347           0   0   0   0   1   5    Iceland
9   12348           0   1   0   0   0   2    Finland
13  12349           0   0   0   0   1   5    Italy
14  12350           0   1   0   0   0   2    Norway

The only way I can think about is using for loop but it will give me a mess. Any other way to do this?

WilliamL
  • 13
  • 2

3 Answers3

3

One option is to dump down into numpy:

Filter for just the Q columns:

cols = df.filter(like = 'Q')

Get the column positions that are equal to 1:

_, positions = cols.to_numpy().nonzero()
df.assign(Q = positions + 1)
    CustomerID_num  Q1  Q2  Q3  Q4  Q5         Country  Q
0            12346   1   0   0   0   0  United Kingdom  1
2            12347   0   0   0   0   1         Iceland  5
9            12348   0   1   0   0   0         Finland  2
13           12349   0   0   0   0   1           Italy  5
14           12350   0   1   0   0   0          Norway  2
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
0

Here are a few other options

d = df.loc[:,lambda x: x.columns.str.startswith('Q')]

Option 1:

np.where(d)[-1]+1

Option 2:

np.argmax(d.to_numpy(),axis=1)+1

Option 3:

d.set_axis(range(d.shape[1]),axis=1).idxmax(axis=1)+1
rhug123
  • 7,893
  • 1
  • 9
  • 24
-1
df.loc[df["Q1"] == 1, "Q"] = 1
df.loc[df["Q2"] == 1, "Q"] = 2
df.loc[df["Q3"] == 1, "Q"] = 3
df.loc[df["Q4"] == 1, "Q"] = 4
df.loc[df["Q5"] == 1, "Q"] = 5

This is a possible solution using loc from pandas here is the documentation https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html

When the condition is true loc set the value for the entire column named "Q"

  • Even if this does solve the problem, this answer does not follow clean coding practices. For instance, if the OP wishes to add another 20 countries, should he copy and paste these lines an additional 20 times? – Hutch Sep 23 '22 at 22:07