0

DataFrame

# |Name                |Price    |24h   |Volume(24h)
50|Maker50MKR          |$1,096.96|4,52  |$351,617,227
36|Decentraland36MANA  |$0.9754  |4,11  |$265,949,302
47|Bitcoin SV47BSV     |$60.38   |4,08  |$50,895,114
86|1inch Network861INCH|$0.7637  |3,74  |$72,279,229
38|Hedera38HBAR        |$0.07594 |3,72  |$58,825,304

Desired Result

# |Name         |Ticker|Price    |24h   |Volume(24h)
50|Maker        |MKR   |$1,096.96|4,52  |$351,617,227
36|Decentraland |MANA  |$0.9754  |4,11  |$265,949,302
47|Bitcoin SV   |BSV   |$60.38   |4,08  |$50,895,114
86|1inch Network|1INCH |$0.7637  |3,74  |$72,279,229
38|Hedera       |HBAR  |$0.07594 |3,72  |$58,825,304

The Problem is:

  • there is no fixed number/digit of string (0-100)
  • overlap with ticker name (e.g 1inch)
  • there is no fixed ticker
wjudho
  • 3
  • 2
  • Users on StackOverflow want to help you but not to write code for you. Please provide your approaches so far. – Jacob Aug 01 '22 at 13:34
  • What have you tried so far? – Dion Aug 01 '22 at 13:37
  • https://stackoverflow.com/questions/430079/how-to-split-strings-into-text-and-number – Dion Aug 01 '22 at 13:38
  • 1
    the *overlap with ticker name (e.g. 1inch)* is going to throw a wrench in your plans. Although it looks like it always has 2 numbers. And actually checking some more, it looks like the number in the string matches the `#` column. If that is the case you can simply do a `string.split()` using the `#` column value to actually split on. – Edo Akse Aug 01 '22 at 13:44

4 Answers4

2

creating a simple data frame of your dataset:

simple_dict = {
    "#" : [50, 36, 47, 86, 38],
    "Name" : ["Maker50MKR", "Decentraland36MANA", "Bitcoin SV47BSV", "1inch Network861INCH", "Hedera38HBAR"],
    "Price" : ["$1,096.96", "$0.9754", "$60.38", "$0.7637", "$0.07594"]
}
df = pd.DataFrame(simple_dict)
>>> df
# Name Price
0 50 Maker50MKR $1,096.96
1 36 Decentraland36MANA $0.9754
2 47 Bitcoin SV47BSV $60.38
3 86 1inch Network861INCH $0.7637
4 38 Hedera38HBAR $0.07594

According to this [comment] (How to split & remove a number in the middle of string in a python?)

updated_dict = {}
for i, row in df.iterrows():
    ans = row["Name"].split(str(row["#"]))
    row.loc["Name"] = ans[0]
    row.loc["Ticker"] = ans[1]
    updated_dict[i] = row

new_df = pd.DataFrame(updated_dict)
>>> new_df
0 1 2 3 4
# 50 36 47 86 38
Name Maker Decentraland Bitcoin SV 1inch Network Hedera
Price $1,096.96 $0.9754 $60.38 $0.7637 $0.07594
Ticker MKR MANA BSV 1INCH HBAR

for right show, use transpose or .T:

>>> new_df.T
# Name Price Ticker
0 50 Maker $1,096.96 MKR
1 36 Decentraland $0.9754 MANA
2 47 Bitcoin SV $60.38 BSV
3 86 1inch Network $0.7637 1INCH
4 38 Hedera $0.07594 HBAR
Galaxy
  • 172
  • 9
  • Sometimes instead of giving an ans in a single line full of methods, a detailed step by step answer like this really helps – Sanju Halder Aug 01 '22 at 14:16
  • 1
    in my case, I get `'NoneType' object has no attribute 'split'` because the dataframe have some None value. I just added .dropna() in the `pd.dataframe().dropna()`. It will delete all rows containing None. Thanks for your answer – wjudho Aug 02 '22 at 07:13
0
pd.DataFrame(df.apply(lambda x: x.Name.split(str(x["#"])), axis=1).values.tolist())
MoRe
  • 2,296
  • 2
  • 3
  • 23
0

So the thing is that you can split the current name into name and ticker based on the # column. The code below is likely not the best code, nor optimal, but it does do what you need...

Perhaps a pandas guru can optimize this. I would be very interested in that at well.

# insert Ticker column
df.insert(df.columns.get_loc("Name")+1, "Ticker", None)


for index, row in df.iterrows():
    # split the thing based on '#' column and update the columns
    df.at[index, "Name"], df.at[index, "Ticker"] = row["Name"].split(str(row["#"]))

print(df)

resulting df:

    #           Name Ticker      Price   24h   Volume(24h)
0  50          Maker    MKR  $1,096.96  4,52  $351,617,227
1  36   Decentraland   MANA    $0.9754  4,11  $265,949,302
2  47     Bitcoin SV    BSV     $60.38  4,08   $50,895,114
3  86  1inch Network  1INCH    $0.7637  3,74   $72,279,229
4  38         Hedera   HBAR   $0.07594  3,72   $58,825,304
Edo Akse
  • 4,051
  • 2
  • 10
  • 21
-1
df['Name'] = df['Name'].apply(lambda name: re.search(r"^[a-zA-Z\s]+", name).group())
ArrowRise
  • 608
  • 2
  • 7