If statement to remove and replace a string

Question

I'm trying to remove parts of a string that make it a strong so that it can become an integer. Although, I also need to take into account the changes in the string.

I've tried to put this into a function; here's what I have done:

import numpy as np

def rem(x):
    data = []
    for i in x:
        if "m" in i:
            data.append(i.replace(".00m", '000000'))
        elif "Th" in i:
            data.append(i.replace("Th.", '000'))
    return data
    
data_array = np.array(['£67.50m', '£63.00m', '£49.50m','£90Th.', '£720Th.'], dtype=object)

rem(data_array)
>['£67.50m', '£63000000', '£49.50m', '£90000', '£720000']

How would I take into account that before m I'll also have numbers from 0-9?

I have tried this in my bigger dataframe but I get the following error:

TypeError: argument of type 'float' is not iterable

Which I'm assuming it's because the function does not take into account .50m, .20m ...?

Using @Ptit Xav suggestion:

def rem(x):
    data = []
    for i in x:
        if "m" in i:
            xi = re.sub("[^\d]", "", i)
            data.append(int(xi)*10000)
        elif "Th" in i:
            hi = re.sub("[^\d]", "", i)
            data.append(int(hi)*1000)
    return data

Does this answer your question? [Removing all non-numeric characters from string in Python](https://stackoverflow.com/questions/1249388/removing-all-non-numeric-characters-from-string-in-python) — Mahrkeenerh, Nov 05 '21 at 09:52
@Mahrkeenerh No, as that only removes strings. I'm asking to remove and replace — me.limes, Nov 05 '21 at 09:53
You should remove unwanted char, convert to float then multiply by 1000 or 1000000, and then convert back to string adding the currency at the beginning. — Ptit Xav, Nov 05 '21 at 09:55
you're asking to remove and replace what exactly? Seems like you didn't finish your sentence — Mahrkeenerh, Nov 05 '21 at 09:56
@PtitXav I managed to get it work with the smaller example although with my larger dataframe I still get the error about floats. Have you any ideas on whats causing this? — me.limes, Nov 05 '21 at 10:02
You must keep the dot and use float values for multiplication. — Ptit Xav, Nov 05 '21 at 10:09

score 0 · Answer 1 · answered Nov 05 '21 at 10:16

You can use the substitution method sub in the package re:

import numpy as np
import re

def rem(x):
    data = []
    for i in x:
        if "m" in i:
            data.append(re.sub("(\.\d+m)", '000000', i))
        elif "Th" in i:
            data.append(i.replace("Th.", '000'))
    return data

score 0 · Answer 2 · answered Nov 05 '21 at 10:17

0

I replaced this code:

data.append(i.replace(".00m", '000000'))

With:

data.append(i.split(".")[0] + "000000")

The output code is:

>['£67000000', '£63000000', '£49000000', '£90000', '£720000']

answered Nov 05 '21 at 10:17

vasek 9876

3
3

Ptit Xav · Answer 3 · 2021-11-05T10:30:19.513

0

With conversion :

if "m" in i:
    xi = re.sub("[^\d.]", "", i)
    data.append("{}{:.0f}".format(i[0],float(xi)*1000000))
elif "Th" in i:
    hi = re.sub("[^\d.]", "", i)
    data.append("{}{:.0f}".format(i[0],float(hi)*1000))

edited Nov 05 '21 at 10:30

answered Nov 05 '21 at 10:24

Ptit Xav

3,006
2
6
15

score 0 · Answer 4 · answered Nov 05 '21 at 10:58

I think that you can make it a little more robust replacing if "m" in i: and elif "Th" in i: with regular expressions.

import re
import warnings
import numpy as np

RE_ENDS_M = re.compile('\.(\d{2})m$')
RE_ENDS_TH = re.compile('Th\.$')

def rem(x):
    data = []
    for i in x: 
        if RE_ENDS_M.search(i):
            data.append(re.sub(RE_ENDS_M, "\g<1>0000", i))
        elif RE_ENDS_TH.search(i):
            data.append(re.sub(RE_ENDS_TH, '000', i))
        else:
            warnings.warn("Ignoring data: %s" % i) 
    return data
    
data_array = np.array(
    ['£67.50m', '£63.00m', '£49.50m','£90Th.', '£720Th.', '1€50'],
    dtype=object
)

print(rem(data_array))

# Outputs:
# UserWarning: Ignoring data 1€50
#  warnings.warn("Ignoring data %s" % i)
# ['£67500000', '£63000000', '£49500000', '£90000', '£720000']

If statement to remove and replace a string

4 Answers4