0

I have the following Data Frame named: mydf:

        A                  B
0       3de (1ABS)      Adiran
1       3SA (SDAS)      Adel
2       7A (ASA)        Ronni
3       820 (SAAa)      Emili

I want to remove the " (xxxx)" and keeps the values in column A , so the dataframe (mydf) will look like:

        A          B
0       3de      Adiran
1       3SA      Adel
2       7A       Ronni
3       820      Emili

I have tried :

print mydf['A'].apply(lambda x: re.sub(r" \(.+\)", "", x) )

but then I get a Series object back and not a dataframe object.

I have also tried to use replace:

df.replace([' \(.*\)'],[""], regex=True), But it didn't change anything.

What am I doing wrong?

Thank you!

EdChum
  • 376,765
  • 198
  • 813
  • 562
Nastya
  • 71
  • 4
  • Possible duplicate of [Pandas DataFrame: remove unwanted parts from strings in a column](http://stackoverflow.com/questions/13682044/pandas-dataframe-remove-unwanted-parts-from-strings-in-a-column) – Martin Nov 25 '16 at 16:36

1 Answers1

1

you can use str.split() method:

In [3]: df.A = df.A.str.split('\s+\(').str[0]

In [4]: df
Out[4]:
     A                   B
0  3de              Adiran
1  3SA                Adel
2   7A               Ronni
3  820               Emili

or using str.extract() method:

In [9]: df.A = df.A.str.extract(r'([^\(\s]*)', expand=False)

In [10]: df
Out[10]:
     A                   B
0  3de              Adiran
1  3SA                Adel
2   7A               Ronni
3  820               Emili
Nickil Maveli
  • 29,155
  • 8
  • 82
  • 85
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Thanks. It works. What is the difference between using df['A'] and using df.A ? From my understanding df['A'] returns a series object, but df.A just modifies column A in the df. – Nastya Nov 25 '16 at 22:35
  • @Nastya, please consider [accepting](http://meta.stackexchange.com/a/5235) an answer if you think it has answered your question – MaxU - stand with Ukraine Nov 25 '16 at 22:39