Pandas remove all of a string in a column after a character

Question

So I have a data set with over 500 rows where one of the columns has values like this:

df:

         column1

 0    a{'...'}  
 1    b{'...'}
 2    c{'...'}  
 3    d{'...'}

I want to remove everything within and including the {}.

I have been looking at this question, Pandas delete parts of string after specified character inside a dataframe and tried the solutions there but I keep getting errors(And I am aware that StringIO is now io.StringIO).

I've tried

df.column1 = df.column1.str.split('{')[0]

but get the error message: KeyError: 0 and don't really understand what that means

I've also tried:

df.column1 = df.column1.str.split(pat='{')

But this only seems deletes the '{' so I'm left with

      column1

 0    a'...'}   
 1    b'...'}
 2    c'...'}   
 3    d'...'}

Also I'm not sure if it's important but the column is an object type. Can anyone tell me what I'm doing wrong and how to fix the issue???

score 7 · Accepted Answer · answered Apr 13 '18 at 15:22

7

You can using replace

df['column1'].str.replace(r"\{.*\}","")
Out[385]: 
0    a
1    b
2    c
3    d
Name: column1, dtype: object

answered Apr 13 '18 at 15:22

BENY

317,841
20
164
234

score 5 · Answer 2 · answered Apr 13 '18 at 15:46

A little late (@Wen's solution is great), but you can use pandas.Series.str.split() as in your original attempt. You were close- you just need to set expand=True.

df["column1"] = df["column1"].str.split("{", expand=True)[0]
#  column1
#0       a
#1       b
#2       c
#3       d

score 4 · Answer 3 · answered Apr 13 '18 at 16:24

4

You can also use pandas.DataFrame.replace and pass a dictionary that specifies what to do for various columns.

Using @Wen's regex pattern

df.replace(dict(column1={'\{.*\}': ''}), regex=True)

  column1
0       a
1       b
2       c
3       d

In the spirit of @pault, you can also use pandas.Series.str.extract

df.column1.str.extract('([^\{]+)', expand=False)

  column1
0       a
1       b
2       c
3       d

answered Apr 13 '18 at 16:24

piRSquared

285,575
57
475
624

@Aongoose you can upvote many answers (when you have 15+ rep) but you can only accept one answer. By accepting my answer, you un-accepted Wen's answer. That may not have been your intention. If not, feel free to accept Wen's answer again by clicking on the checkmark. – piRSquared Apr 13 '18 at 16:53
yup that's my bad thanks for the heads up @piRSquared – Aongoose Apr 13 '18 at 17:11

score 0 · Answer 4 · answered Apr 13 '18 at 15:26

0

Using .apply

df = pd.DataFrame({"a":["a{'...'}", "b{'...'}"]})
df["a"] = df["a"].apply(lambda x: x.split('{')[0])
print df

answered Apr 13 '18 at 15:26

Rakesh

81,458
17
76
113

Pandas remove all of a string in a column after a character

4 Answers4

Linked