-2

I have a dataframe that has two similar phrases, for example 'Hello World' and 'Hello World 1'. I want to match only the 'Hello World' string.

I am currently using: dataframe['Phrase'].str.match('Hello World') But this obviously returns both the phrases 'Hello World' and 'Hello World 1'. Is there a way to match on only the phrase exactly ?

Luke Wild
  • 1
  • 1

3 Answers3

0

You can use RegEx to obtain such result:

import re

phrase_to_find = 'Hello World'
phrases = ['Hello World', 'Hello World 1']

for phrase in phrases:
    if re.search(r'\b' + phrase + r'\b', phrase_to_find):
        print('Found the following match: {}'.format(phrase))

the \b is indicating a word boundary.

Pitto
  • 8,229
  • 3
  • 42
  • 51
  • Hi @Luke Wild , is my solution useful for you? If so please consider upvoting and / or selecting it as answer. Thanks! – Pitto Oct 05 '20 at 13:52
0

All you need is to do an equality test:

dataframe['Phrase'] == 'Hello World'

This will return a boolean array analogous to your substring match case, but requires an exact match.

Example:

a.csv

Phrase,Other_field
Hello World,1
Hello World 1,2
Something else,3

The dataframe:

>>> import pandas as pd
>>> dataframe = pd.read_csv('a.csv')

>>> dataframe
           Phrase  Other_field
0     Hello World            1
1   Hello World 1            2
2  Something else            3

Your substring match:

>>> dataframe['Phrase'].str.match('Hello World')
0     True
1     True
2    False
Name: Phrase, dtype: bool

The exact match:

>>> dataframe['Phrase'] == 'Hello World'
0     True
1    False
2    False
Name: Phrase, dtype: bool
alani
  • 12,573
  • 2
  • 13
  • 23
0

Regular expression on the string.

import re

...
...

if re.search(r'^Hello World$', data_frame_string):
    # Then the string matches, do whatever with the string.
    ....
    
C. Sanchez
  • 11
  • 1