0

So I asked a question related to this recently and while the answer wassimple then ( I failed to utilize a specific column) this time I don't have that column. Here is the OP. None of the extra answers provided there actually work either :/

The problem is with a multilabel data frame when you want to isolate rows that contain 1 for a given class and zero for others. So far here is the code I have but it loops into infinity and crashes colab.

In this case I want just that Action row but Im also trying to loop it so I will append all Action with value 1 and column_list with value 0 next History 1 all others 0 etc...

Again the options provided on the link give me a The truth of the answer is ambiguous error

Index |  Drama | Western | Action | History |
   0        1        1         0         0
   1        0        0         0         1
   2        0        0         1         0


# Column list to be popped
column_list = list(balanced_df.columns)[1:]

single_labels = []
i=0

# 28 columns total
while i < 27:
  # defining/reseting the full column list at the start of each loop
  column_list = list(balanced_df.iloc[:,1:])
  # Pop column name at index i
  x = column_list.pop(i)

  # storing the results in a list of lists
  # Filters for the popped column where the column is 1 & the remaining columns are set to 0
  single_labels.append(balanced_df[(balanced_df[x] == 1) & (balanced_df[column_list]==0)])

  # incriment the column index number for the next run
  i+=1

The output here would be something like

single_labels[0]

    Index |  Drama | Western | Action | History |
       2        0        0         1         0


single_labels[1]
    Index |  Drama | Western | Action | History |
       1        0        0         0         1
Digital Moniker
  • 281
  • 1
  • 12
  • what's your desired result here? – Paul H Apr 02 '21 at 19:56
  • From the comments in the other question, `df.loc[df['Western'].eq(1) & df.sum(axis='columns').eq(1)]` should do it – Paul H Apr 02 '21 at 19:59
  • Sorry its not clear. The result would be list of lists containing rows of the df where the Action column in the rows of list index 0 would have all 1's and other columns all 0's then list index 1 would have History with all 1's and all other columns 0 etc... – Digital Moniker Apr 02 '21 at 20:00
  • type out the dataframe you want to see and put in the question – Paul H Apr 02 '21 at 20:01
  • Okay that solution worked too, do you want to post it and Ill accept it. Thanks – Digital Moniker Apr 02 '21 at 20:02

1 Answers1

1

You don't need a loop. You rarely need loops with pandas. If you're selecting rows based on conditions, you should use boolean indexing.

In your case, that's:

df.loc[df.sum(axis='columns').eq(1)]

As an example:

pandas.DataFrame({
    'A': [1, 0, 0, 0, 0, 1, 1, 0, 0],
    'B': [0, 1, 0, 0, 1, 0, 1, 0, 0],
    'C': [0, 0, 1, 0, 1, 0, 0, 1, 0],
    'D': [0, 0, 0, 1, 0, 1, 0, 1, 0],
}).loc[lambda df: df.sum(axis='columns').eq(1)].values.tolist()

Which outputs:

[[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]
Paul H
  • 65,268
  • 20
  • 159
  • 136
  • I am trying to build a list of lists that contain only the rows that are not multilabel don't I need to loop through each column for this? Your original answer is correct but I need add a variable where I see Western so that the next time it loops it captures History and so on... – Digital Moniker Apr 02 '21 at 20:08
  • @DigitalMoniker nope. No loops. You can add `.values.tolist()` to the end of the command above if you want. – Paul H Apr 02 '21 at 20:11
  • Wow I just plotted that with seaborn now and I see there's no multilabels... that's amazing. It's because of.eq? Im not familiar with it but Ill check it out. Thanks! – Digital Moniker Apr 02 '21 at 20:17
  • @DigitalMoniker if you're using seaborn, you don't need a list of lists. – Paul H Apr 02 '21 at 20:18
  • I just used seaborn to investigate the dataframe that was returned after your original answer. I was mentioning a list of lists because I had used that technique before to capture filtered df results based on multiple different conditions is all – Digital Moniker Apr 02 '21 at 20:22
  • My guess is that you're better off leaving this as a dataframe, but it's hard to know – Paul H Apr 02 '21 at 20:31
  • 100% and your answer is perfect. The list of lists was just a result of my amateur logical thought process for data manipulation. Thanks a lot for the answer. – Digital Moniker Apr 02 '21 at 20:35