1

I have a dataframe, df, shown below. Each row is a story and each column is a word that appears in the corpus of stories. A 0 means the word is absent in the story while a 1 means the word is present.

enter image description here

I want to find which words are present in each story (i.e. col val == 1). How can I go about finding this (preferably without for-loops)?

Thanks!

jo_
  • 677
  • 2
  • 11
  • Does this help: [link](https://stackoverflow.com/questions/65165229/how-to-find-column-number-by-looking-up-a-value) – sammywemmy Dec 06 '20 at 08:23
  • @sammywemmy thank you! It doesn't seem to work as well. In the solution you linked there desired value only appeared once in the row, but for mine it will appear many times (ie there's many words in a story). The order of rows also doesn't matter as much to me as well! – jo_ Dec 06 '20 at 08:30
  • 2
    @JoHe Please don't post images. Images are discouraged on StackOverflow. Please take time to read [`how-to-make-good-reproducible-pandas-examples`](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Shubham Sharma Dec 06 '20 at 08:34

1 Answers1

1

Assuming you are just trying to look at one story, you can filter for the story (let's say story 34972) and transpose the dataframe with:

df_34972 = df[df.index=34972].T

and then you can send the values equal to 1 to a list:

[*df_34972[df_34972['df_34972'] == 1]]

If you are trying to do this for all stories, then you can do this, but it will be a slightly different technique. From the link that SammyWemmy provided, you can melt() the dataframe and filter for 1 values for each story. From there you could .groupby('story_column') which is 'index' (after using reset_index()) in the example below:

df = df.reset_index().melt(id_vars='index')
df = df[df['values'] == 1]
df.groupby('index')['variable'].apply(list)
David Erickson
  • 16,433
  • 2
  • 19
  • 35