convert year, month columns to datetime in python, generate csv

Question

I am new in python and pandas, so facing too many issues right now.Not able to create datetime using pandas and want to create csv file according to given data in dataframe. I want to convert given columns date to single date time, output should be in csv..

Here is my code below:

    # -*- coding: utf-8 -*-
import pandas as pd
df = pd.DataFrame({
        'S.No.':[1,2,3,4,5],
        'YEAR':['1967-68','1968-69','1969-70','1970-71','1971-72'],
        'JUNE':['77.19','415.48','236.71','108.38','6.19'],
        'JULY':['76.19','435.48','26.71','138.38','9.19'],
        'AUGUST':['75.19','415.48','226.71','78.38','3.19']
        })

Now from this code asserting missing month with 0 value and sample output of above code should generate cvs format like below,

datetime;year;month;day;data 
1967-01-01 00:00:00;1967;1;1;0.0
1967-02-01 00:00:00;1967;2;1;0.0 
1967-03-01 00:00:00;1967;3;1;0.0
(contd. )
........
........ 
1967-06-01 00:00:00;1967;6;1;77.19
1967-07-01 00:00:00;1967;7;1;76.19
(contd. )
......... 
1968-06-01 00:00:00;1968;6;1;415.48
1968-07-01 00:00:00;1968;7;1;435.48
......... 
1972-12-01 00:00:00;1972;12;1;0.0

Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your post correspondingly. — MaxU - stand with Ukraine, Jan 14 '18 at 10:13
yep, it looks much better now. Could you also post your desired data set based on that sample input data set - for the output/resulting data set the format isn't important - it could be picture as well... — MaxU - stand with Ukraine, Jan 14 '18 at 10:31
now i'm confused. How did you get value `0.0` in the first row? where the 1968 year comes from? What should happen to `'S.No.'` column? Could you reduce your input data set to 2-3 rows and post a full output data set. Currently it's not very clear — MaxU - stand with Ukraine, Jan 14 '18 at 10:47
@MaxU, in input you have only 3 months data with year and while generating output date time should be in monthly format and, if month is not in column its value should be 0.0. — Binod Bhandary, Jan 14 '18 at 10:50
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/163136/discussion-between-vinod-bhandary-and-maxu). — Binod Bhandary, Jan 14 '18 at 10:53

MaxU - stand with Ukraine · Answer 1 · 2018-01-14T10:02:46.703

3

Try this:

df['Date'] = pd.to_datetime(df[['year','month','day']])

from pd.to_datetime() docs:

Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same

edited Jan 14 '18 at 10:02

answered Jan 14 '18 at 10:00

MaxU - stand with Ukraine

205,989
36
386
419

@vinodbhandary, can you post a small reproducible data set as text/CSV/Python code so we could copy and paste it? – MaxU - stand with Ukraine Jan 14 '18 at 10:02
first image is taken from excel file and second is expected output in csv. – Binod Bhandary Jan 14 '18 at 10:08
@vinodbhandary, do you expect us to type your data set manually from those pictures in order to try to help you? ;-) – MaxU - stand with Ukraine Jan 14 '18 at 10:09
no, i didnt mean that, actually i want to convert excel columns into date time, years given in rows , but months have only column name.. – Binod Bhandary Jan 14 '18 at 10:13

score 1 · Answer 2 · answered Jan 14 '18 at 16:56

Though I am not a Python developer, I tend to help you on logic. Guess this will be helpful for you.

$yrs = ['1967-68','1968-69','1969-70','1970-71','1971-72'];
$new_yrs = [];
foreach ($yrs as $yr) {
    // get first two character of string
    $prefix = substr($yr, 0,2);
    // remove first two character of string
    $postfix = substr($yr, 2);
    // break the string to 2 pcs
    $pcs = explode('-', $postfix);
    // get unique dates
    $new_yrs[$prefix.$pcs[0]] = $prefix.$pcs[0];
    $new_yrs[$prefix.$pcs[1]] = $prefix.$pcs[1];
}

Output: [1967 => "1967", 1968 => "1968", 1969 => "1969", 1970 => "1970", 1971 => "1971", 1972 => "1972"]

You are wrong bro, it's not duplicating year. One in array index and other is value. — Madan Adhikari, Jan 16 '18 at 06:50

score 1 · Answer 3 · answered Jan 18 '18 at 04:11

After lot of practices i got a solutions for this problem, here is my code,

# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import datetime

def month_string_to_number(string):
    m = {
        'jan':1,
        'feb':2,
        'mar':3,
        'apr':4,
        'may':5,
        'jun':6,
        'jul':7,
        'aug':8,
        'sep':9,
        'oct':10,
        'nov':11,
        'dec':12
        }
    s = string.strip()[:3].lower()
    try:
        out = m[s]
        return out
    except:
        raise ValueError('Not a month')

def years_conv(years):
    new_year=[]
    for year in years:
        prefix = year[:2]
        postfix =year[2:]
        pcs    = postfix.split('-')
        year1= prefix + pcs[0]
        new_year.append(year1)
#        year2= prefix + pcs[1]
#        new_year.append(year2)
    return new_year  


def year_conv(year):
    prefix = year[:2]
    postfix =year[2:]
    pcs    = postfix.split('-')
    year1= prefix + pcs[0]
    return year1

def timeseries(data, months):
    newdate=[]
    newdata=[]
    newyear=[]
    for idx1, rfrows in data.iterrows():
        year = year_conv(rfrows['YEAR'])
        for month in months:
            d ='1'
            mnt =month_string_to_number(month)
            newdate.append("-".join((year,str(mnt),d))) 
            newdata.append(rfrows[month]) 
            newyear.append(year) 

    dfdata = {'Data': newdata,'Datetime': newdate,'Year':newyear}        
    df = pd.DataFrame(dfdata)  
    df['Data'] = df['Data'].fillna(0)
    df = df[['Datetime','Year','Data']]    
    df['Datetime'] = pd.DatetimeIndex(df['Datetime']).floor('d')
#    df.set_index('Datetime', inplace=True)
    df.index =df['Datetime']


    df.index = df.index.to_period('m')
    idx = pd.date_range('1967-01-01','2011-01-01',freq='M').to_period('m')    

    #reindex and add NaN
    o_d = df.reindex(idx,fill_value=0)
    #filling missing date    
    #change periodindex to datetimeindex
    o_d['Datetime'] = o_d.index.to_timestamp()
    year_month =o_d['Datetime']
    o_d['Year'], o_d['Month'],o_d['Day'] = o_d['Datetime'].dt.year, o_d['Datetime'].dt.month,o_d['Datetime'].dt.day
    o_d['Datetime'] = pd.DatetimeIndex(year_month).floor('d')
    o_d = o_d[['Datetime','Year','Month','Day','Data']] 
    return o_d

def combine_stations_Data(sd):
    data=sd['lat']+sd['laval']+sd['lon']+sd['lovol']
    return data 


rainfall_data = pd.DataFrame({
        'S.No.':[1,2,3,4,5],
        'YEAR':['1967-68','1968-69','1969-70','1970-71','1971-72'],
        'JUNE':['77.19','25.0','236.71','108.38','6.19'],
        'JULY':['76.19','435.48','26.71','138.38','9.19'],
        'AUGUST':['75.19','415.48','226.71','78.38','3.19']
        })
#years = rainfall_data['YEAR'].tolist()
#rainfall_data.set_index('YEAR', inplace = 'True')
#months = rainfall_data[['JUNE', 'JULY', 'AUGUST']].mean(axis = 1).tolist()
months = ['JUNE', 'JULY', 'AUGUST']

new_rainfall_data = timeseries(rainfall_data,months)  

new_rainfall_data.to_csv('./StationsData/test.csv', index=False ,sep=';')

convert year, month columns to datetime in python, generate csv

3 Answers3