How to split & remove a number in the middle of string in a python?

Question

DataFrame

# |Name                |Price    |24h   |Volume(24h)
50|Maker50MKR          |$1,096.96|4,52  |$351,617,227
36|Decentraland36MANA  |$0.9754  |4,11  |$265,949,302
47|Bitcoin SV47BSV     |$60.38   |4,08  |$50,895,114
86|1inch Network861INCH|$0.7637  |3,74  |$72,279,229
38|Hedera38HBAR        |$0.07594 |3,72  |$58,825,304

Desired Result

# |Name         |Ticker|Price    |24h   |Volume(24h)
50|Maker        |MKR   |$1,096.96|4,52  |$351,617,227
36|Decentraland |MANA  |$0.9754  |4,11  |$265,949,302
47|Bitcoin SV   |BSV   |$60.38   |4,08  |$50,895,114
86|1inch Network|1INCH |$0.7637  |3,74  |$72,279,229
38|Hedera       |HBAR  |$0.07594 |3,72  |$58,825,304

The Problem is:

there is no fixed number/digit of string (0-100)
overlap with ticker name (e.g 1inch)
there is no fixed ticker

Users on StackOverflow want to help you but not to write code for you. Please provide your approaches so far. — Jacob, Aug 01 '22 at 13:34
https://stackoverflow.com/questions/430079/how-to-split-strings-into-text-and-number — Dion, Aug 01 '22 at 13:38
the *overlap with ticker name (e.g. 1inch)* is going to throw a wrench in your plans. Although it looks like it always has 2 numbers. And actually checking some more, it looks like the number in the string matches the `#` column. If that is the case you can simply do a `string.split()` using the `#` column value to actually split on. — Edo Akse, Aug 01 '22 at 13:44

score 2 · Accepted Answer · answered Aug 01 '22 at 14:12

creating a simple data frame of your dataset:

simple_dict = {
    "#" : [50, 36, 47, 86, 38],
    "Name" : ["Maker50MKR", "Decentraland36MANA", "Bitcoin SV47BSV", "1inch Network861INCH", "Hedera38HBAR"],
    "Price" : ["$1,096.96", "$0.9754", "$60.38", "$0.7637", "$0.07594"]
}
df = pd.DataFrame(simple_dict)

>>> df

	#	Name	Price
0	50	Maker50MKR	$1,096.96
1	36	Decentraland36MANA	$0.9754
2	47	Bitcoin SV47BSV	$60.38
3	86	1inch Network861INCH	$0.7637
4	38	Hedera38HBAR	$0.07594

According to this [comment] (How to split & remove a number in the middle of string in a python?)

updated_dict = {}
for i, row in df.iterrows():
    ans = row["Name"].split(str(row["#"]))
    row.loc["Name"] = ans[0]
    row.loc["Ticker"] = ans[1]
    updated_dict[i] = row

new_df = pd.DataFrame(updated_dict)

>>> new_df

	0	1	2	3	4
#	50	36	47	86	38
Name	Maker	Decentraland	Bitcoin SV	1inch Network	Hedera
Price	$1,096.96	$0.9754	$60.38	$0.7637	$0.07594
Ticker	MKR	MANA	BSV	1INCH	HBAR

for right show, use transpose or .T:

>>> new_df.T

	#	Name	Price	Ticker
0	50	Maker	$1,096.96	MKR
1	36	Decentraland	$0.9754	MANA
2	47	Bitcoin SV	$60.38	BSV
3	86	1inch Network	$0.7637	1INCH
4	38	Hedera	$0.07594	HBAR

Sometimes instead of giving an ans in a single line full of methods, a detailed step by step answer like this really helps — Sanju Halder, Aug 01 '22 at 14:16
in my case, I get `'NoneType' object has no attribute 'split'` because the dataframe have some None value. I just added .dropna() in the `pd.dataframe().dropna()`. It will delete all rows containing None. Thanks for your answer — wjudho, Aug 02 '22 at 07:13

MoRe · Answer 2 · 2022-08-01T14:28:18.210

0

pd.DataFrame(df.apply(lambda x: x.Name.split(str(x["#"])), axis=1).values.tolist())

edited Aug 01 '22 at 14:28

answered Aug 01 '22 at 14:10

MoRe

2,296
2
3
23

score 0 · Answer 3 · answered Aug 01 '22 at 14:14

So the thing is that you can split the current name into name and ticker based on the # column. The code below is likely not the best code, nor optimal, but it does do what you need...

Perhaps a pandas guru can optimize this. I would be very interested in that at well.

# insert Ticker column
df.insert(df.columns.get_loc("Name")+1, "Ticker", None)


for index, row in df.iterrows():
    # split the thing based on '#' column and update the columns
    df.at[index, "Name"], df.at[index, "Ticker"] = row["Name"].split(str(row["#"]))

print(df)

resulting df:

    #           Name Ticker      Price   24h   Volume(24h)
0  50          Maker    MKR  $1,096.96  4,52  $351,617,227
1  36   Decentraland   MANA    $0.9754  4,11  $265,949,302
2  47     Bitcoin SV    BSV     $60.38  4,08   $50,895,114
3  86  1inch Network  1INCH    $0.7637  3,74   $72,279,229
4  38         Hedera   HBAR   $0.07594  3,72   $58,825,304

score -1 · Answer 4 · answered Aug 01 '22 at 13:42

-1

df['Name'] = df['Name'].apply(lambda name: re.search(r"^[a-zA-Z\s]+", name).group())

answered Aug 01 '22 at 13:42

ArrowRise

608
2
7

How to split & remove a number in the middle of string in a python?

4 Answers4