0

I've got an ID column with mixed datatypes, which are causing me issues when I pivot. I have some IDs as float type, so when I try and cast them to ints, then to strings. If I cast the column as a whole, the strings subset throw an error, since it is illogical to cast a string to an int.

I also know that mutating a datatype whilst iterating over a column is a bad idea. Has anyone got any ideas?

Here's a visual representation:

ID

  1. Str
  2. Int
  3. Float

Trying to cast them all to strings. Also, want the '.0' ending of the floats to not be there. Any ideas?

  • cast to str than format the strings as you want. – Mohamed Ali JAMAOUI Sep 21 '17 at 08:14
  • 1
    can you post the code? do you have a data frame with all these datatypes in one column? what type is the id column? I want to recreate your problem but I am not sure what to do – Michail N Sep 21 '17 at 08:21
  • Provide some example data, provide what traceback you're receiving. I don't understand what you mean by 'it is illogical to cast a string to an int'. What do you mean by this? Do you mean you have some strings which aren't strings representing numbers? – greg_data Sep 21 '17 at 08:44
  • Have a look at [this question](https://stackoverflow.com/questions/15891038/pandas-change-data-type-of-columns/28648923#28648923), it sounds like it may be similar to yours. You could also try applying a lambda function to the column that tries to cast floats to strings, and ints to strings, with whatever error handling and formatting particulars you'd like. – charlesreid1 Sep 21 '17 at 08:44

1 Answers1

0

Assuming you have a column that consists of integers, floats, and strings, which are all read in as strings from a file, you'll have something like this:

s = pd.Series(['10', '20', '30.4', '40.7', 'text', 'more text', '50.0'])

in which case, you can apply a function to convert the floats to integers, then a second function to convert the integers (back) to strings:

import pandas as pd

def print_type(x):
    print(type(x))
    return x

def to_int(x):
    try:
        # x is a float or an integer, and will be returned as an integer
        return int(pd.to_numeric(x))
    except ValueError:
        # x is a string
        return x

def to_str(x):
    return str(x)

s = pd.Series(['10', '20', '30.4', '40.7', 'text', 'more text', '50.0'])

s2 = s.apply(to_int).apply(to_str)

print("Series s:")
print(s)
print("\nSeries s2:")
print(s2)
print("\nData types of series s2:")
print(s2.apply(print_type))

Here is the output, showing that, in the end, each number has been converted to a string version of an integer:

Series s:
0           10
1           20
2         30.4
3         40.7
4         text
5    more text
6         50.0
dtype: object

Series s2:
0           10
1           20
2           30
3           40
4         text
5    more text
6           50
dtype: object

Data types of series s2:
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
0           10
1           20
2           30
3           40
4         text
5    more text
6           50
dtype: object

Not sure if that's what you're after, but if not, hopefully it will give you an idea of how to get started. This is using Pandas 0.19.2:

In [1]: import pandas as pd

In [2]: print(pd.__version__)
0.19.2
charlesreid1
  • 4,360
  • 4
  • 30
  • 52