Find which columns contain a certain value for each row in a dataframe

Question

I have a dataframe, df, shown below. Each row is a story and each column is a word that appears in the corpus of stories. A 0 means the word is absent in the story while a 1 means the word is present.

I want to find which words are present in each story (i.e. col val == 1). How can I go about finding this (preferably without for-loops)?

Thanks!

Does this help: [link](https://stackoverflow.com/questions/65165229/how-to-find-column-number-by-looking-up-a-value) — sammywemmy, Dec 06 '20 at 08:23
@sammywemmy thank you! It doesn't seem to work as well. In the solution you linked there desired value only appeared once in the row, but for mine it will appear many times (ie there's many words in a story). The order of rows also doesn't matter as much to me as well! — jo_, Dec 06 '20 at 08:30
@JoHe Please don't post images. Images are discouraged on StackOverflow. Please take time to read [`how-to-make-good-reproducible-pandas-examples`](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Shubham Sharma, Dec 06 '20 at 08:34

David Erickson · Accepted Answer · 2020-12-06T08:41:35.443

Assuming you are just trying to look at one story, you can filter for the story (let's say story 34972) and transpose the dataframe with:

df_34972 = df[df.index=34972].T

and then you can send the values equal to 1 to a list:

[*df_34972[df_34972['df_34972'] == 1]]

If you are trying to do this for all stories, then you can do this, but it will be a slightly different technique. From the link that SammyWemmy provided, you can melt() the dataframe and filter for 1 values for each story. From there you could .groupby('story_column') which is 'index' (after using reset_index()) in the example below:

df = df.reset_index().melt(id_vars='index')
df = df[df['values'] == 1]
df.groupby('index')['variable'].apply(list)

Find which columns contain a certain value for each row in a dataframe

1 Answers1