1

I'm trying to remove the 3rd and 4th letter in every string in a column of a DF. It's a different letter each time, so I don't know how to use regex to do it. For example, if my DF is:

{A                     B           C

'32435'                3           5

'45243'                2           4}

I'm trying to turn it into:

{A                     B           C

'325'                  3           2

'453'                  2           4}
Inder
  • 3,711
  • 9
  • 27
  • 42
AVL
  • 41
  • 1
  • 7
  • 2
    Why did `C` in the first row become 2 instead of 5? – sacuL Jul 22 '18 at 16:44
  • Welcome to StackOverflow! To help others answer your question better, consider including details explaining _what you have tried_ or _the research you have done_ and _why it hasn't worked_. You might want to read [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask) for more information. – ricky3350 Jul 22 '18 at 16:46
  • 1
    df['A'].apply(lambda a: a[:2] + a[4:]) should do the trick. Regex is not needed here. Have a look at Python slicing: https://stackoverflow.com/questions/509211/understanding-pythons-slice-notation – Viktor Jul 22 '18 at 16:49

3 Answers3

1
df['A']=df['A'].str[:2]+df['A'].str[-1]
Pyd
  • 6,017
  • 18
  • 52
  • 109
  • 1
    I think it should be df['A'].str[4:] if there are more than 5 ciphers in a string. But I like that is completely vectorized (without apply). – Viktor Jul 22 '18 at 16:53
1

You can use the following code for this:

DF["A"] = DF["A"].map(lambda x: str(x)[0:2]+str(x)[4])

This will give column A as:

"325"
"453"

This method is faster and more efficient than the other methods suggested.

Inder
  • 3,711
  • 9
  • 27
  • 42
0

If column A is not guaranteed to be of length 5 it might be helpful to check for that first:

df = pd.DataFrame({'A':['32435','45243','123']})

def stripstring(s):
    # put in try except in case string is short
    try:
        return s[:2] + s[4:]
    except IndexError:
        return s

df['A'] = df.apply(lambda row: stripstring(row['A']), axis=1)

output

    A
0  '325'
1  '453'
2   '12'
jcp
  • 841
  • 1
  • 9
  • 14