-2

I am using python 3.

I have a column of decimal numbers called “CIPCODE” in a dataframe called “data”. The integer part of this column ranges from 1 to 60.

I want to format it such that:

the first condition is: if the value of the integer part is between 1 and 9 (inclusive), then add a zero in front of the number, for example -

4.2021 becomes 04.2021

25.3434 remains 25.3434

so basically, we should always have 2 digits before the decimal.

the second condition is: that there should always be 4 digits after the decimal, for example -

51.201 becomes 51.2010

34.5555 remains 34.5555

I have tried the following:

data['CIPCODE'] = data['CIPCODE'].astype(str).str.zfill(7)

but this only pads zeros to the part before the decimal point.

analyst92
  • 243
  • 1
  • 6
  • Warm welcome to SO. Please try to use correct upper case letters, e.g. in the beginning of your title, sentences or the word "I". This would be gentle to your readers. Please read [How to ask](https://stackoverflow.com/help/how-to-ask) and [Minimal Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). Then update your question with code to show us what you have tried so far. – buhtz Jun 01 '23 at 06:58

3 Answers3

4

You can directly use python formatting (07.4f, meaning 4 digits after the decimal and padded to 7 characters) :

df['formatted'] = df['CIPCODE'].apply(lambda x: f'{x:07.4f}')

Output:

   CIPCODE formatted
0   4.2021   04.2021
1  25.3434   25.3434
2  12.3000   12.3000
mozway
  • 194,879
  • 13
  • 39
  • 75
2

Use python formating for custom string:

data = pd.DataFrame({'CIPCODE':[4.2021,25.3434,51.201,34.5555]})

data['CIPCODE'] = data['CIPCODE'].apply('{:07.4f}'.format)

print (data)
   CIPCODE
0  04.2021
1  25.3434
2  51.2010
3  34.5555

              {:07.4f}
                ↑ ↑ 
                | |
# digits to pad | | # of decimal places to display
Corralien
  • 109,409
  • 8
  • 28
  • 52
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Might not be the fastest way, but you can do this by putting this logic into a function and applying it to the dataframe.

Example:

import pandas as pd
from collections import deque

df = pd.DataFrame({'CIPCODE': [4.2021, 25.3434, 51.201, 34.5555]}) 

def format_cipcode(code):
    d = deque(code)
    if d[1] == '.':
        d.appendleft('0')
    if d[-4] == '.':
        d.append('0')
    return ''.join(d)


df['CIPCODE'] = df['CIPCODE'].astype(str).apply(format_cipcode)

print(df['CIPCODE'])

Output:

Name: CIPCODE, dtype: object
0    04.2021
1    25.3434
2    51.2010
3    34.5555

In this case I used a deque, which is a sort of list that you can append to left and right. For the left side, check if the second character in the string is a . and append a 0 if it is. For the right side, check if the fourth character from the right is a . and if so, append a 0

Kenneth Breugelmans
  • 501
  • 1
  • 8
  • 22