-1

I am currently reading in an excel sheet of data. In the original file the DOB column is formatted like [20050506]. I would like to change the format of this column to [2005-May-06]. enter image description here

I am semi-successful doing this with the code below: enter image description here

We can see that it does not like the date formatted underlined in the red...

How can I default any date that only has a year specified [i.e 20060000 -> 2006-00-00] to be defaulted to [i.e 20060101 -> 2006-Jan-01]?

PickleRick
  • 61
  • 5
  • I'm really confused... Firstly, the question in the title [has already been asked](/q/38067704), but it doesn't match the question in the body, which is about the data, not its formatting. Secondly, that `to_datetime` shouldn't work since the `format` doesn't match the input data you're showing, unless maybe you already changed the data in another cell? Thirdly, even if it did work, the result should be shown as ISO-8601, unless maybe your environment is set up to show dates as `%Y-%b-%d`? (1/2) – wjandrea Mar 31 '23 at 04:29
  • (2/2) Please [edit] to clarify and make a [mre] including the data as text, [not a picture](//meta.stackoverflow.com/q/285551). You can use [code formatting](/editing-help#code) for it. If you want more tips, check out [How to ask a good question](/help/how-to-ask) and [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea Mar 31 '23 at 04:30

1 Answers1

1

You can define a function and apply to the DOB column to get the desired result:

import pandas as pd

# Function to apply to DOB
def to_date(x):
    # Get date
    val = pd.to_datetime(x, format='%Y-%m-%d', errors='coerce')
    # Only year is supplied
    if pd.isnull(val):
        # Get the year substring
        val = pd.to_datetime(x[:4], format='%Y', errors='coerce')
    return val

# This is my toy sample of your dataframe
data = {'DOB':['20070102','20060924','20050110','20050708','20060919','20060000']}
df = pd.DataFrame(data)

# Apply function to DOB
df['DOB'] = df['DOB'].apply(to_date)

df['DOB'] now looks like this:

         DOB
0 2007-01-02
1 2006-09-24
2 2005-01-10
3 2005-07-08
4 2006-09-19
5 2006-01-01
Marcelo Paco
  • 2,732
  • 4
  • 9
  • 26
  • hypothetically there are over 1,000 dob's to be changed will this still work?????? I see you listed out each dob within my dataframe but I want to be able to not have to list them out like you've done 'data' – PickleRick Mar 30 '23 at 23:53
  • I created the toy datraframe so that I can show you how the solution would work. Yes it will work with over 1,000 dob's. Your dataframe already has the `DOB` column. All you'd have to do is add the `to_date` function and call it as shown in the answer `df['DOB'] = df['DOB'].apply(to_date)`. – Marcelo Paco Mar 31 '23 at 00:31