3

So I have a data set with over 500 rows where one of the columns has values like this:

df:

         column1

 0    a{'...'}  
 1    b{'...'}
 2    c{'...'}  
 3    d{'...'}  

I want to remove everything within and including the {}.

I have been looking at this question, Pandas delete parts of string after specified character inside a dataframe and tried the solutions there but I keep getting errors(And I am aware that StringIO is now io.StringIO).

I've tried

df.column1 = df.column1.str.split('{')[0]

but get the error message: KeyError: 0 and don't really understand what that means

I've also tried:

df.column1 = df.column1.str.split(pat='{')

But this only seems deletes the '{' so I'm left with

      column1

 0    a'...'}   
 1    b'...'}
 2    c'...'}   
 3    d'...'}   

Also I'm not sure if it's important but the column is an object type. Can anyone tell me what I'm doing wrong and how to fix the issue???

Brian
  • 2,163
  • 1
  • 14
  • 26
Aongoose
  • 45
  • 1
  • 8

4 Answers4

7

You can using replace

df['column1'].str.replace(r"\{.*\}","")
Out[385]: 
0    a
1    b
2    c
3    d
Name: column1, dtype: object
BENY
  • 317,841
  • 20
  • 164
  • 234
5

A little late (@Wen's solution is great), but you can use pandas.Series.str.split() as in your original attempt. You were close- you just need to set expand=True.

df["column1"] = df["column1"].str.split("{", expand=True)[0]
#  column1
#0       a
#1       b
#2       c
#3       d
pault
  • 41,343
  • 15
  • 107
  • 149
4

You can also use pandas.DataFrame.replace and pass a dictionary that specifies what to do for various columns.

Using @Wen's regex pattern

df.replace(dict(column1={'\{.*\}': ''}), regex=True)

  column1
0       a
1       b
2       c
3       d

In the spirit of @pault, you can also use pandas.Series.str.extract

df.column1.str.extract('([^\{]+)', expand=False)

  column1
0       a
1       b
2       c
3       d
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • @Aongoose you can upvote many answers (when you have 15+ rep) but you can only accept one answer. By accepting my answer, you un-accepted Wen's answer. That may not have been your intention. If not, feel free to accept Wen's answer again by clicking on the checkmark. – piRSquared Apr 13 '18 at 16:53
  • yup that's my bad thanks for the heads up @piRSquared – Aongoose Apr 13 '18 at 17:11
0

Using .apply

df = pd.DataFrame({"a":["a{'...'}", "b{'...'}"]})
df["a"] = df["a"].apply(lambda x: x.split('{')[0])
print df
Rakesh
  • 81,458
  • 17
  • 76
  • 113