Transfer pandas column (some are str, some are int) to certain format

Question

I have a pretty messy dataset. For example, one column has data such as

import pandas as pd

df = pd.DataFrame(dict(col1=[
    421991.003,'421991.0029999995','0421991.003','0421991.0W5'
]))

Expected str values after reformating each value in the column:

'0421991.003','0421991.003','0421991.003','0421993.0W5'

The first is in int, the others in str How can I transfer them into a str format of XXXXXXX.XXX

Does this answer your question? [Convert Columns to String in Pandas](https://stackoverflow.com/questions/22005911/convert-columns-to-string-in-pandas) — busybear, Jun 26 '20 at 19:45
whats is your expected output? what should `'421991.0029999995' ` look like after it is converted to XXXXXXX.XXX format? should the .XXX part be floored .002, rounded upto .003? — Phillyclause89, Jun 26 '20 at 19:49
@busybear Hi, as I have muti datatype, it is kind of different from that question. — TensorFrozen, Jun 26 '20 at 19:54
Can you clarify what exactly the issue is? Please provide a [mcve], as well as the current and expected output. — AMC, Jun 26 '20 at 20:03
So basically I don't know how to transfer these .0029999X data to .003 while we have some other str type data in the column. Is it able to specify if the data can be interpreted as int, we transfer to a str with 3 decimal. If it include A,B,C..., we keep it@AMC — TensorFrozen, Jun 26 '20 at 20:06

score 0 · Answer 1 · answered Jun 26 '20 at 20:18

You can solve this for float numbers with round() function. As you need 3 decimals.

num = 421991.0029999995
print(round(num, 3))

This function will round the third decimal number to the next number while the fourth number is greater than or equal to 5. Let me show you If your number is 421991.0025 or more this will rounded as 421991.003

But for less than .0025 it will remain .002

score 0 · Answer 2 · answered Jun 26 '20 at 20:19

Probably not the most elegant solution, but this should work if all your data looks like one of the examples you gave:

import numpy as np

data = [4219911.003, '421991.0029999995', '0421991.003', '04219911.0W5']

data = np.array(data).astype(str)
formatted_data = []
for d in [d.split('.') for d in data]:
    d[0] = d[0].zfill(7)
    try:
        d[1] = '{0:.3f}'.format(float(d[1]) / (10**len(d[1])))[2:]
    except:
        pass
    formatted_data.append('.'.join(d))

score 0 · Answer 3 · answered Jun 26 '20 at 20:35

0

I actually figure this out....I found out I can put my own function in lambda...

def change(x):
    if len(x) > 12:
        x = str(round(float(x), 3))
    return '0' + x if len(x) == 10 else x

df['xxx'] = df['xxx'].apply(lambda x: change(x))

answered Jun 26 '20 at 20:35

TensorFrozen

37
5

Transfer pandas column (some are str, some are int) to certain format

3 Answers3