17

I have a dataframe that consists of separate columns for year, month and day. I tried to combine these individual columns into one date using:

df['myDt']=pd.to_datetime(df[['year','month','day']])

only to get the following error: "to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing". Not sure what this means....I'm already supplying the relevant columns. On checking the datatypes, I found that they Year, Month and Day columns are int64. Would that be causing an issue? Thanks, Chet

Thank you all for posting. As suggested, I'm posting the sample data set first: Value mm yy dd Date
2018-11-30 88.550067 11 2018 1 2018-12-31 88.906290 12 2018 1 2019-01-31 88.723000 1 2019 1 2019-02-28 89.509179 2 2019 1 2019-03-31 90.049161 3 2019 1 2019-04-30 90.523100 4 2019 1 2019-05-31 90.102484 5 2019 1 2019-06-30 91.179400 6 2019 1 2019-07-31 90.963570 7 2019 1 2019-08-31 92.159170 8 2019 1

The data source is:https://www.quandl.com/data/EIA/STEO_NGPRPUS_M I imported the data as follows: 1. import quandl (used conda install first) 2. Used Quandl's Python code:

data=quandl.get("EIA/STEO_NGPRPUS_M", authtoken="TOKEN","2005-01-01","2005-12-31") 4. Just to note, the original data comes only with the Value column, and DateTime as index. I extracted and created the mm,yy and dd columns (month, year, and dd is a column vector set to 1) All I'm trying to do is create another column called "first of the month" - so for each day of each month, the column will just show "MM/YY/1". I'm going to try out all the suggestions below shortly and get back to you guys. Thanks!!

Chet
  • 421
  • 1
  • 4
  • 8

5 Answers5

23

Solution

You could use datetime.datetime along with .apply().

import datetime

d = datetime.datetime(2020, 5, 17)
date = d.date()

For pandas.to_datetime(df)

It looks like your code is fine. See pandas.to_datetime documentation and How to convert columns into one datetime column in pandas?.

df = pd.DataFrame({'year': [2015, 2016],
                   'month': [2, 3],
                   'day': [4, 5]})
pd.to_datetime(df[["year", "month", "day"]])

Output:

0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

What if your YEAR, MONTH and DAY columns have different headers?

Let's say your YEAR, MONTH and DAY columns are labeled as yy, mm and dd respectively. And you prefer to keep your column names unchanged. In that case you could do it as follows.

import pandas as pd

df = pd.DataFrame({'yy': [2015, 2016],
                   'mm': [2, 3],
                   'dd': [4, 5]})
df2 = df[["yy", "mm", "dd"]].copy()
df2.columns = ["year", "month", "day"]
pd.to_datetime(df2)

Output:

0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]
CypherX
  • 7,019
  • 3
  • 25
  • 37
  • 2
    A good package for processing date time is `lubridate`. Use e.g. the `ymd()` function for your issue. – André Müller Sep 24 '19 at 03:45
  • @CypherX - your suggestion looked the easiest, so I started with that. I've seen the link you shared yesterday. I found the issue...the column names HAVE to be "year", "month", "day", and not my own abbreviations (I was using 'yy', 'mm','dd'). I tried that and it worked. – Chet Sep 25 '19 at 02:57
  • @Chet Thank you for accepting my answer. You need not change your original column names. Since, `pd.to_datetime()` requires `year`, `month`, `day`, you could try something like what I suggested in the last part that I added now. – CypherX Sep 25 '19 at 17:05
  • @CypherX - thanks sir, I'll take a look at that and let you know! – Chet Sep 26 '19 at 16:14
  • @CypherX - this is interesting.....while it works, it's also an extra step....maybe this is something that GitHub should take note of......it shouldn't matter what the column names are as long as the vectors are provided in the right order (e.g. excel's date (year,month,day) doesn't care about column names.....just a thought. Thanks for this very interesting suggestion. – Chet Sep 28 '19 at 20:32
  • ```df2 = df[["yy", "mm", "dd"]].copy()``` results in ```AttributeError: 'method' object has no attribute 'columns'``` creating a new df without copy works for me – brygid Apr 29 '21 at 12:05
6

Here is a two liner:

df['dateInt']=df['year'].astype(str) + df['month'].astype(str).str.zfill(2)+ df['day'].astype(str).str.zfill(2)
df['Date'] = pd.to_datetime(df['dateInt'], format='%Y%m%d')

Output

    year  month day dateInt     Date
0   2015    5   20  20150520    2015-05-20
1   2016    6   21  20160621    2016-06-21
2   2017    7   22  20170722    2017-07-22
3   2018    8   23  20180823    2018-08-23
4   2019    9   24  20190924    2019-09-24
Grant Shannon
  • 4,709
  • 1
  • 46
  • 36
  • 1
    Thanks sir. Thanks for your help - I ended up trying CypherX's code below, and I realized my error was that I was using my own abbreviations for the column names - Python wants them to be specific. I changed accordingly and it works now. – Chet Sep 25 '19 at 03:00
4

You should use the apply method as follows:

from datetime import datetime
df['myDt'] = df.apply(lambda row: datetime.strptime(f"{int(row.year)}-{int(row.month)}-{int(row.day)}", '%Y-%m-%d'), axis=1)

Running Example:

>>> d = {'year': list(range(2015, 2020)), 'month': list(range(5, 10)), 'day': >> list(range(20, 25))}
>> df = pd.DataFrame(d)
>> df

    year    month   day myDt
0   2015    5       20  2015-05-20
1   2016    6       21  2016-06-21
2   2017    7       22  2017-07-22
3   2018    8       23  2018-08-23
4   2019    9       24  2019-09-24
lmiguelvargasf
  • 63,191
  • 45
  • 217
  • 228
  • Thanks sir. This gave me an error that I can't figure out. I tried the code suggested by @CypherX at the bottom and that worked. Seems like Python wants very specific names for columns; I can't use my own abbreviations. Learning! – Chet Sep 25 '19 at 02:59
  • @Chet, no worries. I wonder what is the error you are getting, if you can provide some feedback I will give you further help. – lmiguelvargasf Sep 25 '19 at 03:01
  • @Imiguelvargasf - the error I get sir is pasted below. Please bear in mind that I'm VERY new to Python....so I don't understand all that's going on...it's all trial and error for me at this time. " ValueError: ("time data '2005.0-1.0-1.0' does not match format '%Y-%m-%d'", 'occurred at index 2005-01-31 00:00:00') " – Chet Sep 25 '19 at 03:09
  • 1
    @Chet, I updated my answer. The problem is that `year`, `month`, and `day` columns are `float`s, and they should be `int`s. Use `int` built-in to fix it. – lmiguelvargasf Sep 25 '19 at 03:15
  • 1
    @Imiguelvargasf - thanks sir....this works for me as well. It's a little hard to read (given my newbie status)..but I'm keeping this in my repository of useful code. Thanks!! – Chet Sep 28 '19 at 20:33
0
#Add and calculate a new Calculated_Date column

df['Calculated_Date'] = df[['year', 'month', 'day']].apply(lambda x: '{}-{}-{}'.format(x[0], x[1], x[2]), axis=1)

df['Calculated_Date'].head()


#Parse your Calculated_Date column into a datetime obj (not needed; but if you need to parse)

df['Calculated_Date'] = pd.to_datetime(df['Calculated_Date'])

df['Calculated_Date'].head()
Flair
  • 2,609
  • 1
  • 29
  • 41
Vinit
  • 1
  • 2
0

Improving the answer from @lmiguelvargasf, sometimes you want to save as datetime format. Furthermore, using apply (IMHO) is better if other column is exist with some value (something like sales for the example).

import datetime

df['dt'] = df.apply(lambda row: datetime.datetime(int(row.yy),
                                                  int(row.mm),
                                                  int(row.dd)), axis=1)
df.head()

Note: my example only working if the yy value is in 2022 for example. If your yy value is 21, you need to modify such as 2000 + int(row.yy).

Muhammad Yasirroni
  • 1,512
  • 12
  • 22