0

I have a dataframe with multiple columns containing phrases. What I would like to do is

  1. identify the column (per row observation) that contains a string that exists within a pre-made list of words.
  2. With this information, create a new variable in this dataframe that contains the value in the column that matched with the list. (In this example, "lst" is my list of words)

For example: Starting Dataframe:

starting data

And I would like to end up with this:

final dataframe

New_var is the new variable, and it selected the response in col1 of observation 1 because the "apple" in apple sauce matched with the "apple" in the list. Big oranges would appear for Observation 2 because it matched with "oranges" from the list.

I have tried doing this with list comprehension from this link: List Comprehension, but remain unsuccessful. I wish to do this in python. Any suggestions? I am relatively new to this programming language.

Thank you very much. If I have posted inappropriately or the answer exists somewhere I have not found it, I appreciate any guidance in the right direction.

Ben
  • 1
  • 1

1 Answers1

0

Let's take the list of words and data frame as you have mentioned

lst = ['a','m','n','o','p']

df = pd.DataFrame({'Observation': [1], 'col1': ['ab'], 'col2': ['dc'], 'col3': ['ef'], 'col4': ['yz']})
df
   Observation  col1    col2    col3    col4
  0    1         ab      dc      ef      yz

Check whether values of data frame match with values in the list

df['New_var'] = [x for x in df.values[0] if any(b for b in lst if b in str(x))]
df
   Observation  col1    col2    col3    col4    New_var
  0        1     ab      dc      ef      yz       ab
Abdul Quddus
  • 111
  • 2
  • 10
  • This looks like it would work! However, when I run it on my dataframe, I get a return error ValueError: Length of values does not match length of index. my dataframe consists of 547 rows with 20 columns, each containing either NA's or strings of text within each column. Any ideas? Could it be because of the presence of NA's? – Ben May 09 '18 at 12:24