1

I'm new to Pandas and I'd like to ask your advice. Let's take this dataframe:

df_test = pd.DataFrame({'Dimensions': ['22.67x23.5', '22x24.6', '45x56', 'x23x56.22','46x23x','34x45'],
                     'Other': [59, 29, 73, 56,48,22]})

I want to detect the lines that starts with "x" (line 4) or ends with "x" (line 5) and then remove them so my dataframe should look like this

Dimensions  Other
22.67x23.5  59
22x24.6     29
45x56       73
23x56.22    56
46x23       48
34x45       22

I wanted to create a function and apply it to a column

def remove_x(x):
    if (x.str.match('^[a-zA-Z]') == True):
        x = x[1:]
        return x
    if (x.str.match('.*[a-zA-Z]$') == True):
        x = x[:-1]
        return x

If I apply this function to the column

df_test['Dimensions'] = df_test['Dimensions'].apply(remove_x)

I got an error 'str' object has no attribute 'str' I delete 'str' from the function and re-run all but no success.

What should I do? Thank you for any suggestions or if there is another way to do it I'm interested in.

Tom
  • 8,310
  • 2
  • 16
  • 36
Tim Dunn
  • 229
  • 1
  • 2
  • 7

2 Answers2

1

Just use str.strip:

df_test['Dimensions'] = df_test['Dimensions'].str.strip('x')

For general patterns, you can try str.replace:

df_test['Dimensions'].str.replace('(^x)|(x$)','')

Output:

   Dimensions  Other
0  22.67x23.5     59
1     22x24.6     29
2       45x56     73
3    23x56.22     56
4       46x23     48
5       34x45     22
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

@QuangHoang's answer is better (for simplicity and efficiency), but here's what went wrong in your approach. In your apply function, you are making calls to accessing the str methods of a Series or DataFrame. But when you call df_test['Dimensions'].apply(remove_x), the values passed to remove_x are the elements of df_test['Dimensions'], aka the str values themselves. So you should construct the function as if x is an incoming str.

Here's how you could implement that (avoiding any regex):

def remove_x(x):
    if x[0] == 'x':
        return x[1:]
    elif x[-1] == 'x':
        return x[:-1]
    else:
        return x

More idiomatically:

def remove_x(x):
    x.strip('x')

Or even:

df_test['Dimensions'] = df_test['Dimensions'].apply(lambda x : x.strip('x'))

All that said, better to not use apply and follow the built-ins shown by Quang.

Tom
  • 8,310
  • 2
  • 16
  • 36