1

I have a dataframe that looks like this with 13000 rows

print(df)
           Date       Price   Nation
0    01/01/2018    -5.000,73  Spain
1    01/01/2018        15,60  Italy
3    01/01/2018        14,13  Italy
4    01/01/2018        12,53  Spain
5    01/01/2018        16,64  Italy
6    01/01/2018        22,48  Italy
7    01/01/2018        24,30  Italy
8    01/01/2018        24,88  Spain
9    01/01/2018        31,40  Italy
10   01/01/2018        18,74  Italy

Price Column is a non-null object I removed all the white spaces and dropped al the empty rows I tried to work only with the price column converting from a Series to a String

string=df['Price'].to_string()
print(string)
0        -5.000,73
1            15,60
3            14,13
4            12,53
5            16,64
6            22.48
7            24.30
8            24.88
9            31.40
10           18.74

When I tried

string=string.strip('.')
string=string.replace(',','.')
float(string.strip().strip("'"))

The system returned me an error message:

ValueError: could not convert string to float: '0
-5.000.73\n1            15.60\n3            14.13\n4            12.53\n5            16.64\n6            22.48\n7            24.30\n8            24.88\n9            31.40\n10         

The data are loaded from a csv, and the delimiter is ,

  1. How can I easily convert to a float, considering that \n7, \n10,..., \n160000 are not regularly increased?
  2. Why I get this kind of problem?
  3. If I open my csv with Jupyter or on excel I cannot see /n32, but only something like this
    "01/01/2018","16,60","Spain", I know that /n is used for giving a vertical space, but I don't know how to deal with it, I noticed that is related with the index

I read this questions, tried other solutions, but none solved my problem:

numpy.loadtxt, ValueError: could not convert string to float

ValueError: could not convert string to float: '-0,274697\n'

How do I parse a string to a float or int in Python?

Python convert string to float

Python. ValueError could not convert string to float:

ValueError: could not convert string to float in Pyspark

Andrea Ciufo
  • 359
  • 1
  • 3
  • 19
  • 1
    Just one remark: there is no `n7` in your string. There is `\n7`. Thta is different. `\n` is one character - the newline. – zvone Oct 05 '18 at 18:11
  • 1
    Please post a few more lines from the price column. Your error shows additional entries that might be needed to determine what is going on. Try posting 0 through 20. – artomason Oct 05 '18 at 18:13
  • 1
    I'm on my phone so can't code, but fix the comma issue in the dataframe before converting back to python – roganjosh Oct 05 '18 at 18:15
  • 2
    I see -5.000.73, is this even a valid float? What are you trying to do? – pvy4917 Oct 05 '18 at 18:16
  • 3
    Unlike writing money for other denominations, you need to use periods for decimal points, and no commas. As @Prazy said, -5.000.73 is not a valid float number, and 5,000 will give you (5,0) due to how Python intepreters work (example: 5,132 will give you a tuple of `(5,132)`) – jhomr Oct 05 '18 at 18:18
  • 1
    Also you should avoid using `string` as a variable name – jhomr Oct 05 '18 at 18:20
  • I loaded the csv, price column could be positive or negative. I tried to use only periods, I read a question about it, but when I first stripped the period and then replaced the commas with the periods, the system returned this weird output `-5.000.73` – Andrea Ciufo Oct 05 '18 at 18:23
  • For -5.000,73 you need to remove a ',' (comma). I feel, the comma there is dividing them into units, hundreds...? – pvy4917 Oct 05 '18 at 18:23
  • The comma in the raw csv divides units from decimal – Andrea Ciufo Oct 05 '18 at 18:24
  • so 15,60 is equal to 15.60? – pvy4917 Oct 05 '18 at 18:26
  • So, after '.' (point) there is 000,73. Is it like tenth (0) hundredth(0) thousandth (0), ten-thousandth (7) and Hundred-thousandth (3)? – pvy4917 Oct 05 '18 at 18:29
  • @jhomr then the way he is coding is wrong. – pvy4917 Oct 05 '18 at 18:30
  • That is my assumption. Deleted my old comment as I was unsure if I was already telling you things you knew ;) – jhomr Oct 05 '18 at 18:30
  • @Andrea Ciufo I'm fairly sure your issue is that 5.000 should be 5000 and not 5,000, and the numbers after that have commas instead of decimals – jhomr Oct 05 '18 at 18:35
  • Yes 15,60 is equal (after I replaced `,` with `.`) to 15.60 , like fifteen euros and 60 cents and, always from the raw csv -5.000,73 is minus five thousand euros and 73 cents, Yes but I don't know why the code does not strip and then replace – Andrea Ciufo Oct 05 '18 at 18:39
  • 2
    `strip()` only affects starting and ending characters – jhomr Oct 05 '18 at 18:43
  • If you can edit the Csv in excel, choose the column and change the formatting to proper ####.## form then pass that Csv to your code. – jhomr Oct 05 '18 at 18:46

2 Answers2

3

strip() only removes leading and trailing characters. You can achieve that by using replace() instead:

string = "-5.000,73"
string = string.replace('.', '')
string = string.replace(',', '.')
# float(string.strip().strip("'"))
float(string)

string.strip(s[, chars])

Return a copy of the string with leading and trailing characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the both ends of the string this method is called on.

More information can be found here

Also, I'm not sure why you are trying to strip apostrophe's as well, but the syntax for that should be string.replace(''', ''); I would wait until you are done stripping characters to convert it too a float using float().

On a side note, you can use print statements in your code to see what is going wrong. As soon as I pasted your original code. It was apparent that the first period was not being removed from the string.

Community
  • 1
  • 1
artomason
  • 3,625
  • 5
  • 20
  • 43
  • 1
    But, string = "-5.000,73" string = string.replace('.', '') # Will replace . after 5 with , string = string.replace(',', '.') in your code replace both comma's with '.' – pvy4917 Oct 05 '18 at 19:22
  • 1
    instead you can use this string = string.replace(',', '.',1) string = -5.00,73 – pvy4917 Oct 05 '18 at 19:24
  • 1
    @Prazy you will need to be a little more specific. The OP did not give any real data to work with other than what he has posted. In this case, I assume he is trying to remove the period, and convert the comma into a period in order to represent currency in some form. – artomason Oct 05 '18 at 19:29
  • 1
    What you write as the answer will output -5.000.73 (which is not valid). – pvy4917 Oct 05 '18 at 19:55
  • 1
    @Prazy directly cutting and pasting the code produces -5000.73 when you append a print statement to the end. – artomason Oct 05 '18 at 19:57
  • 1
    Ah did not see a blank in this. Sorry, string = string.replace('.', '') – pvy4917 Oct 05 '18 at 20:11
  • @AndreaCiufo if this solved your issue, please consider marking it as the answer. – artomason Oct 06 '18 at 12:59
  • @artomason sure I will try the code I also waited for other answers. I didn't put more real data because I followed this suggestion [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) that a user gave me – Andrea Ciufo Oct 06 '18 at 16:15
1

Convert a list of numbers to float

    numbers = [
    '₹ 524.00',
    '₹ 5,000.00',
    '₹ 485.00',
    '₹ 800.00',
    '₹ 236.00',
    '₹ 2,000.00',
    '₹ 630.00',
    '₹ 385.00',
    '₹ 500.00',
    '₹ 629.00',
    '₹ 301.00',
    '₹ 141.00',
]

numbers = [(num[2:]) for num in numbers]
numbers = [float(num.replace(',', '').strip()) for num in numbers]
print(f'Amounts in the list: {numbers}')
grey
  • 209
  • 3
  • 6