How to cast column in Pandas with multiple datatypes?

Question

I've got an ID column with mixed datatypes, which are causing me issues when I pivot. I have some IDs as float type, so when I try and cast them to ints, then to strings. If I cast the column as a whole, the strings subset throw an error, since it is illogical to cast a string to an int.

I also know that mutating a datatype whilst iterating over a column is a bad idea. Has anyone got any ideas?

Here's a visual representation:

ID

Str
Int
Float

Trying to cast them all to strings. Also, want the '.0' ending of the floats to not be there. Any ideas?

can you post the code? do you have a data frame with all these datatypes in one column? what type is the id column? I want to recreate your problem but I am not sure what to do — Michail N, Sep 21 '17 at 08:21
Provide some example data, provide what traceback you're receiving. I don't understand what you mean by 'it is illogical to cast a string to an int'. What do you mean by this? Do you mean you have some strings which aren't strings representing numbers? — greg_data, Sep 21 '17 at 08:44
Have a look at [this question](https://stackoverflow.com/questions/15891038/pandas-change-data-type-of-columns/28648923#28648923), it sounds like it may be similar to yours. You could also try applying a lambda function to the column that tries to cast floats to strings, and ints to strings, with whatever error handling and formatting particulars you'd like. — charlesreid1, Sep 21 '17 at 08:44

score 0 · Answer 1 · answered Sep 21 '17 at 09:03

Assuming you have a column that consists of integers, floats, and strings, which are all read in as strings from a file, you'll have something like this:

s = pd.Series(['10', '20', '30.4', '40.7', 'text', 'more text', '50.0'])

in which case, you can apply a function to convert the floats to integers, then a second function to convert the integers (back) to strings:

import pandas as pd

def print_type(x):
    print(type(x))
    return x

def to_int(x):
    try:
        # x is a float or an integer, and will be returned as an integer
        return int(pd.to_numeric(x))
    except ValueError:
        # x is a string
        return x

def to_str(x):
    return str(x)

s = pd.Series(['10', '20', '30.4', '40.7', 'text', 'more text', '50.0'])

s2 = s.apply(to_int).apply(to_str)

print("Series s:")
print(s)
print("\nSeries s2:")
print(s2)
print("\nData types of series s2:")
print(s2.apply(print_type))

Here is the output, showing that, in the end, each number has been converted to a string version of an integer:

Series s:
0           10
1           20
2         30.4
3         40.7
4         text
5    more text
6         50.0
dtype: object

Series s2:
0           10
1           20
2           30
3           40
4         text
5    more text
6           50
dtype: object

Data types of series s2:
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
0           10
1           20
2           30
3           40
4         text
5    more text
6           50
dtype: object

Not sure if that's what you're after, but if not, hopefully it will give you an idea of how to get started. This is using Pandas 0.19.2:

In [1]: import pandas as pd

In [2]: print(pd.__version__)
0.19.2

How to cast column in Pandas with multiple datatypes?

1 Answers1