0

I have a pretty messy dataset. For example, one column has data such as

import pandas as pd

df = pd.DataFrame(dict(col1=[
    421991.003,'421991.0029999995','0421991.003','0421991.0W5'
]))

Expected str values after reformating each value in the column:

'0421991.003','0421991.003','0421991.003','0421993.0W5'

The first is in int, the others in str How can I transfer them into a str format of XXXXXXX.XXX

Phillyclause89
  • 674
  • 4
  • 12
  • Does this answer your question? [Convert Columns to String in Pandas](https://stackoverflow.com/questions/22005911/convert-columns-to-string-in-pandas) – busybear Jun 26 '20 at 19:45
  • whats is your expected output? what should `'421991.0029999995' ` look like after it is converted to XXXXXXX.XXX format? should the .XXX part be floored .002, rounded upto .003? – Phillyclause89 Jun 26 '20 at 19:49
  • @Phillyclause89 should be '0421991.003' – TensorFrozen Jun 26 '20 at 19:53
  • @busybear Hi, as I have muti datatype, it is kind of different from that question. – TensorFrozen Jun 26 '20 at 19:54
  • Can you clarify what exactly the issue is? Please provide a [mcve], as well as the current and expected output. – AMC Jun 26 '20 at 20:03
  • So basically I don't know how to transfer these .0029999X data to .003 while we have some other str type data in the column. Is it able to specify if the data can be interpreted as int, we transfer to a str with 3 decimal. If it include A,B,C..., we keep it@AMC – TensorFrozen Jun 26 '20 at 20:06

3 Answers3

0

You can solve this for float numbers with round() function. As you need 3 decimals.

num = 421991.0029999995
print(round(num, 3))

This function will round the third decimal number to the next number while the fourth number is greater than or equal to 5. Let me show you If your number is 421991.0025 or more this will rounded as 421991.003

enter image description here

But for less than .0025 it will remain .002

Ahnaf
  • 33
  • 3
  • 9
0

Probably not the most elegant solution, but this should work if all your data looks like one of the examples you gave:

import numpy as np

data = [4219911.003, '421991.0029999995', '0421991.003', '04219911.0W5']

data = np.array(data).astype(str)
formatted_data = []
for d in [d.split('.') for d in data]:
    d[0] = d[0].zfill(7)
    try:
        d[1] = '{0:.3f}'.format(float(d[1]) / (10**len(d[1])))[2:]
    except:
        pass
    formatted_data.append('.'.join(d))
0

I actually figure this out....I found out I can put my own function in lambda...

def change(x):
    if len(x) > 12:
        x = str(round(float(x), 3))
    return '0' + x if len(x) == 10 else x

df['xxx'] = df['xxx'].apply(lambda x: change(x))