Update a column value based on filter python

Question

I have two datasets say df1 and df:

df1

    df1 = pd.DataFrame({'ids': [101,102,103],'vals': ['apple','java','python']})

   ids    vals
0  101   apple
1  102    java
2  103  python

df

df = pd.DataFrame({'TEXT_DATA': [u'apple a day keeps doctor away', u'apple tree in my farm', u'python is not new language', u'Learn python programming', u'java is second language']})

                       TEXT_DATA
0  apple a day keeps doctor away
1          apple tree in my farm
2     python is not new language
3       Learn python programming
4        java is second language

What I want to do is want to update the columns values based on filtered data and map the match data to the new column such that my output is

                       TEXT_DATA      NEW_COLUMN
0  apple a day keeps doctor away      101
1          apple tree in my farm      101
2     python is not new language      103
3       Learn python programming      103
4        java is second language      102

I tried matching using

df[df['TEXT_DATA'].str.contains("apple")]

is there any way by which i can do this?

How would you handle two keywords in the same entry? Like: "The python tries to eat an apple". — NoSplitSherlock, Mar 13 '19 at 13:43

NoSplitSherlock · Accepted Answer · 2019-03-13T14:08:07.627

1

You could do something like this:

my_words = {'python': 103, 'apple': 101, 'java': 102}
for word in my_words.keys():
    df1.loc[df1['my_column'].str.contains(word, na=False), ['my_second_column']] = my_words[word]

edited Mar 13 '19 at 14:08

answered Mar 13 '19 at 13:54

NoSplitSherlock

605
4
19

This gives me error "ValueError: cannot index with vector containing NA / NaN values". IF possible you could me an output for this – Andre_k Mar 13 '19 at 13:59
Edited answer to remedy NaN values. – NoSplitSherlock Mar 13 '19 at 14:03
"my_words = {'python': 103, 'apple': 101, 'java': 102} for word in my_words.keys(): df.loc[df['TEXT_DATA'].str.contains(word, na=False), ['TEXT_DATA']] = my_words[word] " overwrites the value, it doesn't add another column – Andre_k Mar 13 '19 at 14:07
Output from your code " TEXT_DATA 0 101 1 101 2 103 3 103 4 102" – Andre_k Mar 13 '19 at 14:07
1

I edited the answer, but this is something you could really have researched yourself by looking at the answer for a couple of minutes. – NoSplitSherlock Mar 13 '19 at 14:08

score 1 · Answer 2 · answered Mar 13 '19 at 14:51

First, you need to extract the values in df1['vals']. Then, create a new column and add the extraction result to the new column. And finally, merge both dataframes.

extr = '|'.join(x for x in df1['vals'])
df['vals'] = df['TEXT_DATA'].str.extract('('+ extr + ')', expand=False)
newdf = pd.merge(df, df1, on='vals', how='left')

To select the fields in the result, type the column name in the header section:

newdf[['TEXT_DATA','ids']]

score 0 · Answer 3 · answered Mar 13 '19 at 14:43

0

You could use a cartesian product of both dataframes and then select the relevant rows and columns.

tmp = df.assign(key=1).merge(df1.assign(key=1), on='key').drop(columns='key')
resul = tmp.loc[tmp.apply(func=(lambda x: x.vals in x.TEXT_DATA), axis=1)]\
                              .drop(columns='vals').reset_index(drop=True)

answered Mar 13 '19 at 14:43

Serge Ballesta

143,923
11
122
252

This give me error 'TypeError: drop() got an unexpected keyword argument 'columns'' – Andre_k Mar 14 '19 at 05:28

Update a column value based on filter python

3 Answers3