2

In working with some of our data, I had to perform a pretty basic conditional combinations of columns. After filling null values, attempted to add to columns in the assignment of a new variable. One of the columns ended up being object, which is not at all unprecedented. What I found, however, was that seemingly valid values would not convert to float (e.g. 4,789.67). After much searching, it seems that every solution I have seen points to the existence of an irregular character (which does not describe my case). Consequently, I tried to experiment in IPython to recreate the error, and I was successful. I do not understand, however, why I got this error:

TEST

z='4,534.07' #initial assignment
print z
print type(z) #checked type
print repr(z) #tried to reveal hidden characters
print repr(z.replace("'","")) #tried to remove excess quotes
print z[1:-1] #tried again to remove excess quotes
print float(z) #failed conversion attempt

OUTPUT

4,534.07
<type 'str'>
'4,534.07'
'4,534.07'
,534.0


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-8a3c46ebe6ab> in <module>()
      6 print z[1:-1]
      7 print z
----> 8 print float(z)

ValueError: invalid literal for float(): 4,534.07

The solutions I have seen for the basic conversion question invariably suggest the following for conversion of 'x' to float -->> float(x). I would be very grateful for anyone who can explain what I have missed. (I have not had this happen before.)

I have been using the Enthought platform:


Release notes Canopy 1.0.0.1160

Canopy 1.0.0

First release. See Documention Browser, Canopy Users Guide for release notes describing what's new and any known issues and workarounds


Thanks

Marvin Ward Jr
  • 1,019
  • 1
  • 11
  • 30
  • `the existence of an irregular character`. Do you think the comma is a regular character for a number? – Bibhas Debnath Apr 24 '13 at 20:14
  • I think the problem is the 'comma' within your number.remove it and try again – Jerry Meng Apr 24 '13 at 20:16
  • @MarkRansom: It's slightly different because it's about floats rather than ints (which means that, e.g., the `locale`-based answer needs to use `locale.atof` rather than `locale.atoi`)… but yeah, I think it's close enough to be a dup. – abarnert Apr 24 '13 at 20:28
  • 1
    @abarnert in that case try this one: http://stackoverflow.com/questions/6633523/how-can-i-convert-a-string-with-dot-and-comma-into-a-float-number-in-python – Mark Ransom Apr 24 '13 at 20:35
  • @MarkRansom: I already voted to close with your first link, and I'm not sure how to change it… but yeah, I think it probably should be marked as a dup of your second link instead. – abarnert Apr 24 '13 at 20:40

2 Answers2

5

The only problem is that you have to remove the comma. 4,534.07 is not a valid float literal, but 4534.07 is.

(That's exactly what the ValueError: invalid literal for float(): 4,534.07 is telling you, except that it's missing the "did you mean…?" suggestion.)

So:

z='4,534.07'
print float(z.replace(',', ''))

Also, all those attempts to "remove excess quotes" do nothing because there are no quotes in the string. Of course there are quotes when you print out the repr of the string, but that doesn't mean they're in the string itself, it means that the repr of any string is enclosed in an extra pair of quotes. Since those quotes aren't in the string, they can't influence any function you call on that string (unless that function does something really, really stupid, like calling repr on its argument to build up a string to call eval on…).

Plus, even if the problem were excess quotes, just print z[1:-1] or print z.replace("'", "") wouldn't actually remove them from z, it would just print out what it would look like if you had done so. To actually change the value of z, you have to assign something to it. For example, if you add print z.replace(',', '') to your existing code, the float(z) will still fail. But if you add z = z.replace(',', ''), then the float(z) will succeed.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Whoever downvoted, care to explain why? – abarnert Apr 24 '13 at 20:25
  • Wow, quick response. I assumed that commas could be processed because I had seen this before. (I literally just did the same thing with another dataset that contained commas, and I had no issue.) Thanks for the rapid fire corrections. UPDATE: Thanks for the extra explanation as well. These kinds of things are helpful for a novice. – Marvin Ward Jr Apr 24 '13 at 21:13
  • Oh, and looking back at the code, the first thing I did was attempt to remove the commas >> for x in df.columns: x.replace(",",""). As you may have guessed, this didn't work for me so I looked elsewhere. – Marvin Ward Jr Apr 24 '13 at 21:29
0

I would use re to replace anything that isn't a digit or dot. Like

>>> import re
>>> float(re.sub(r'[^0-9.]', '', '1.234,567'))
1.234567

If you care about signs then include - and + in your pattern

float(re.sub(r'[^-+\d.]', '', '-1.234,567'))
Meitham
  • 9,178
  • 5
  • 34
  • 45
  • Why would you do this? Just to make sure that, say, negative numbers (or anything else you forgot about) are interpreted incorrectly? – abarnert Apr 24 '13 at 20:18
  • @abarnert good point about negative numbers, I will update the answer to include signs. However, re is more trusted than replace as the result is guarnteed to be a valid input to float. – Meitham Apr 24 '13 at 20:22
  • And now you won't handle `1e6`. (Again, or anything else you forgot about.) And of course it now passes things that aren't valid floats, like `3-4`. So "re is more trusted than replace" by who, exactly? – abarnert Apr 24 '13 at 20:24
  • The point is that trying to write a whitelist instead of a blacklist is no better (neither one is a real parser, obviously), and it requires more information, so why bother? (And, secondarily, even if you _do_ want a whitelist or blacklist by characters, there is no reason you need a regex instead of just string operations, because there are no regular patterns beyond characters there.) – abarnert Apr 24 '13 at 20:27