How do I change data-type of pandas data frame to string with a defined format?

Question

I'm starting to tear my hair out with this - so I hope someone can help. I have a pandas DataFrame that was created from an Excel spreadsheet using openpyxl. The resulting DataFrame looks like:

print image_name_data
     id           image_name
0  1001  1001_mar2014_report
1  1002  1002_mar2014_report
2  1003  1003_mar2014_report

[3 rows x 2 columns]

…with the following datatypes:

print image_name_data.dtypes
id            float64
image_name     object
dtype: object

The issue is that the numbers in the id column are, in fact, identification numbers and I need to treat them as strings. I've tried converting the id column to strings using:

image_name_data['id'] = image_name_data['id'].astype('str')

This seems a bit ugly but it does produce a variable of type 'object' rather than 'float64':

print image_name_data.dyptes
id            object
image_name    object
dtype: object

However, the strings that are created have a decimal point, as shown:

print image_name_data
       id           image_name
0  1001.0  1001_mar2014_report
1  1002.0  1002_mar2014_report
2  1003.0  1003_mar2014_report

[3 rows x 2 columns]

How can I convert a float64 column in a pandas DataFrame to a string with a given format (in this case, for example, '%10.0f')?

score 63 · Accepted Answer · edited Dec 19 '18 at 18:44

63

I'm unable to reproduce your problem but have you tried converting it to an integer first?

image_name_data['id'] = image_name_data['id'].astype(int).astype('str')

Then, regarding your more general question you could use map (as in this answer). In your case:

image_name_data['id'] = image_name_data['id'].map('{:.0f}'.format)

edited Dec 19 '18 at 18:44

ahajib

12,838
29
79
120

answered Mar 09 '14 at 00:08

exp1orer

11,481
7
38
51

Ta-dah! Both suggestions seem to work perfectly. Thanks very much! I'm afraid I don't have a high enough reputation to rate this answer - but would if I could. – user1718097 Mar 09 '14 at 00:45
@user1718097 Glad to hear it. I'm also new to SO but I think that you can mark it as the "best answer" or something. – exp1orer Mar 09 '14 at 01:01
4

Converting to int first fails if there are any NaN/null values (error message is "*** ValueError: Cannot convert NA to integer"). I have data that is either int or missing, but astype('str') happily adds '.0' to every number... not sure how to prevent this. – John Prior Nov 22 '14 at 00:24

score 8 · Answer 2 · answered Nov 05 '16 at 14:59

8

If you could reload this, you might be able to use dtypes argument.

pd.read_csv(..., dtype={'COL_NAME':'str'})

answered Nov 05 '16 at 14:59

smishra

3,122
29
31

score 2 · Answer 3 · answered Nov 24 '14 at 19:50

I'm putting this in a new answer because no linebreaks / codeblocks in comments. I assume you want those nans to turn into a blank string? I couldn't find a nice way to do this, only do the ugly method:

s = pd.Series([1001.,1002.,None])
a = s.loc[s.isnull()].fillna('')
b = s.loc[s.notnull()].astype(int).astype(str)
result = pd.concat([a,b])

How do I change data-type of pandas data frame to string with a defined format?

3 Answers3

Linked