0

I have Fifa dataset and it includes information about football players. One of the features of this dataset is the value of football players but it is in string form such as "$300K" or "$50M". How can I delete simply these euro and "M, K" symbol and write their values in same units?

import numpy as np
import pandas as pd

location = r'C:\Users\bemrem\Desktop\Python\fifa\fifa_dataset.csv'

_dataframe = pd.read_csv(location)

_dataframe = _dataframe.dropna()
_dataframe = _dataframe.reset_index(drop=True)
_dataframe = _dataframe[['Name', 'Value', 'Nationality', 'Age', 'Wage', 
'Overall', 'Potential']]

_array = ['Belgium', 'France', 'Brazil', 'Croatia', 'England',' Portugal', 
'Uruguay', 'Switzerland', 'Spain', 'Denmark']

_dataframe = _dataframe.loc[_dataframe['Nationality'].isin(_array)]
_dataframe = _dataframe.reset_index(drop=True) 


print(_dataframe.head())
print()
print(_dataframe.tail())

I tried to convert this Value column but I failed. This is what I get

           Name   Value Nationality  Age   Wage  Overall  Potential
0        Neymar   €123M      Brazil   25  €280K       92         94
1     L. Suárez    €97M     Uruguay   30  €510K       92         92
2     E. Hazard  €90.5M     Belgium   26  €295K       90         91
3  Sergio Ramos    €52M       Spain   31  €310K       90         90
4  K. De Bruyne    €83M     Belgium   26  €285K       89         92

              Name Value Nationality  Age Wage  Overall  Potential
4931    A. Kilgour  €40K     England   19  €1K       47         56
4932      R. White  €60K     England   18  €2K       47         65
4933     T. Sawyer  €50K     England   18  €1K       46         58
4934     J. Keeble  €40K     England   18  €1K       46         56
4935  J. Lundstram  €60K     England   18  €1K       46         64

But I want to my output looks like this:

           Name   Value Nationality  Age   Wage  Overall  Potential
0        Neymar   123      Brazil   25  €280K       92         94
1     L. Suárez    97     Uruguay   30  €510K       92         92
2     E. Hazard  90.5     Belgium   26  €295K       90         91
3  Sergio Ramos    52       Spain   31  €310K       90         90
4  K. De Bruyne    83     Belgium   26  €285K       89         92

              Name Value Nationality  Age Wage  Overall  Potential
4931    A. Kilgour  0.04     England   19  €1K       47         56
4932      R. White  0.06     England   18  €2K       47         65
4933     T. Sawyer  0.05     England   18  €1K       46         58
4934     J. Keeble  0.04     England   18  €1K       46         56
4935  J. Lundstram  0.06     England   18  €1K       46         64
  • What is desired output. I know you described it in words, but I am not really sure I understand completely. Perhaps you can post your desired output, the same way you posted your `_dataframe.head()` – KenHBS Dec 27 '18 at 19:45
  • I edited the question. –  Dec 27 '18 at 19:54
  • Possible duplicate of [Pandas Extract Number from String](https://stackoverflow.com/questions/37683558/pandas-extract-number-from-string) if you are sure that the 'value' column is always denoted in millions – KenHBS Dec 27 '18 at 19:56
  • Not exactly because letters in Value column also has numerical meaning if I apply this method it will give same result for $40K and $40M –  Dec 27 '18 at 20:01
  • Possible duplicate of [Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe](https://stackoverflow.com/questions/39684548/convert-the-string-2-90k-to-2900-or-5-2m-to-5200000-in-pandas-dataframe) – Lucas H Dec 28 '18 at 16:40

1 Answers1

0

I do not have enough reputation to flag an answer as a duplicate. However, I believe that this will solve your particular question in addition to providing a solution if there is no "K" or "M" in your string.

You will also need to replace $ with in the regex.

Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe

Lucas H
  • 927
  • 8
  • 15