0

I have a dataframe (df) containing names with abbreviations like below:

Name
ABC CO
XYZ CO LTD 
S.A.L.P, S.P.A.
XXX L.P 
NUR YER SAN.TIC.LTD 
BAAB TERMINALS LTD.

I have to replace the abbreviations with their complete words referring to a list. So Below was my approach

import pandas as pd
repl = {'CO' : 'COMPANY','LTD' : 'LIMITED','L.P' : 'LIMITED PARTNERSHIP','LTD.' : 'LIMITED','.LTD' : 'LIMITED'}
repl = {rf'\b{k}\b': v for k, v in repl.items()}

df2 = df['Name'].replace(repl, regex=True)
df2

Below is the output;

0                        ABC COMPANY
1                XYZ COMPANY LIMITED
2    S.A.LIMITED PARTNERSHIP, S.P.A.
3            XXX LIMITED PARTNERSHIP
4            NUR YER SAN.TIC.LTD 
5             BAAB TERMINALS LIMITED.
Name: Name, dtype: object

here S.A.L.P must not replaced with L.P

Expected output :

    0                        ABC COMPANY
    1                XYZ COMPANY LIMITED
    2                    S.A.L.P, S.P.A.
    3            XXX LIMITED PARTNERSHIP
    4            NUR YER SAN.TIC.LIMITED
    5             BAAB TERMINALS LIMITED.
    Name: Name, dtype: object

The code should replace L.P with LIMITED PARTNERSHIP only when it is present separately as a different string not when it is a part of some string. Can anyone help me out with the issue please. Thanks.

2 Answers2

0

Put spaces before and after the words, e.g. for L.P:

repl = {'CO' : 'COMPANY','LTD' : 'LIMITED',' L.P ' : ' LIMITED PARTNERSHIP '}
gtomer
  • 5,643
  • 1
  • 10
  • 21
0

You may be able to use this regex with look-arounds that makes sure we don't have a non-whitespace before and after key:

repl = {rf'(?<!\S){re.escape(k)}(?!\S)': v for k, v in repl.items()}

Here:

  • (?<!\S): Asserts that previous character is not a non-whitespace
  • (?!\S): Asserts that next character is not a non-whitespace
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Hey thanks for the suggestion. It almost solved my problem but I have a record like 'abc .LTD' corresponding lookup list have '.LTD : LIMITED' to replace the .LTD with LIMITED but it didnt word on this record. – Vijaya Bhaskar Oct 12 '20 at 07:32
  • Can you please edit question and show expected matches? – anubhava Oct 12 '20 at 08:10
  • Also try my updated code – anubhava Oct 12 '20 at 08:24
  • Hi thr, tried the updated code but no luck . And also updated the input and expected output. Any help would be grateful. Thanks. – Vijaya Bhaskar Oct 12 '20 at 12:24
  • You asked for `S.A.L.P must not replaced with L.P` but then want `TIC.LTD` to become `TIC.LIMITED`? Both requirements are contradictory in nature. – anubhava Oct 12 '20 at 14:25