2

I am new in python and pandas, so facing too many issues right now.Not able to create datetime using pandas and want to create csv file according to given data in dataframe. I want to convert given columns date to single date time, output should be in csv..

Here is my code below:

    # -*- coding: utf-8 -*-
import pandas as pd
df = pd.DataFrame({
        'S.No.':[1,2,3,4,5],
        'YEAR':['1967-68','1968-69','1969-70','1970-71','1971-72'],
        'JUNE':['77.19','415.48','236.71','108.38','6.19'],
        'JULY':['76.19','435.48','26.71','138.38','9.19'],
        'AUGUST':['75.19','415.48','226.71','78.38','3.19']
        })

Now from this code asserting missing month with 0 value and sample output of above code should generate cvs format like below,

datetime;year;month;day;data 
1967-01-01 00:00:00;1967;1;1;0.0
1967-02-01 00:00:00;1967;2;1;0.0 
1967-03-01 00:00:00;1967;3;1;0.0
(contd. )
........
........ 
1967-06-01 00:00:00;1967;6;1;77.19
1967-07-01 00:00:00;1967;7;1;76.19
(contd. )
......... 
1968-06-01 00:00:00;1968;6;1;415.48
1968-07-01 00:00:00;1968;7;1;435.48
......... 
1972-12-01 00:00:00;1972;12;1;0.0
Binod Bhandary
  • 422
  • 1
  • 5
  • 22

3 Answers3

3

Try this:

df['Date'] = pd.to_datetime(df[['year','month','day']])

from pd.to_datetime() docs:

Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
1

Though I am not a Python developer, I tend to help you on logic. Guess this will be helpful for you.

$yrs = ['1967-68','1968-69','1969-70','1970-71','1971-72'];
$new_yrs = [];
foreach ($yrs as $yr) {
    // get first two character of string
    $prefix = substr($yr, 0,2);
    // remove first two character of string
    $postfix = substr($yr, 2);
    // break the string to 2 pcs
    $pcs = explode('-', $postfix);
    // get unique dates
    $new_yrs[$prefix.$pcs[0]] = $prefix.$pcs[0];
    $new_yrs[$prefix.$pcs[1]] = $prefix.$pcs[1];
}

Output: [1967 => "1967", 1968 => "1968", 1969 => "1969", 1970 => "1970", 1971 => "1971", 1972 => "1972"]

1

After lot of practices i got a solutions for this problem, here is my code,

# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import datetime

def month_string_to_number(string):
    m = {
        'jan':1,
        'feb':2,
        'mar':3,
        'apr':4,
        'may':5,
        'jun':6,
        'jul':7,
        'aug':8,
        'sep':9,
        'oct':10,
        'nov':11,
        'dec':12
        }
    s = string.strip()[:3].lower()
    try:
        out = m[s]
        return out
    except:
        raise ValueError('Not a month')

def years_conv(years):
    new_year=[]
    for year in years:
        prefix = year[:2]
        postfix =year[2:]
        pcs    = postfix.split('-')
        year1= prefix + pcs[0]
        new_year.append(year1)
#        year2= prefix + pcs[1]
#        new_year.append(year2)
    return new_year  


def year_conv(year):
    prefix = year[:2]
    postfix =year[2:]
    pcs    = postfix.split('-')
    year1= prefix + pcs[0]
    return year1

def timeseries(data, months):
    newdate=[]
    newdata=[]
    newyear=[]
    for idx1, rfrows in data.iterrows():
        year = year_conv(rfrows['YEAR'])
        for month in months:
            d ='1'
            mnt =month_string_to_number(month)
            newdate.append("-".join((year,str(mnt),d))) 
            newdata.append(rfrows[month]) 
            newyear.append(year) 

    dfdata = {'Data': newdata,'Datetime': newdate,'Year':newyear}        
    df = pd.DataFrame(dfdata)  
    df['Data'] = df['Data'].fillna(0)
    df = df[['Datetime','Year','Data']]    
    df['Datetime'] = pd.DatetimeIndex(df['Datetime']).floor('d')
#    df.set_index('Datetime', inplace=True)
    df.index =df['Datetime']


    df.index = df.index.to_period('m')
    idx = pd.date_range('1967-01-01','2011-01-01',freq='M').to_period('m')    

    #reindex and add NaN
    o_d = df.reindex(idx,fill_value=0)
    #filling missing date    
    #change periodindex to datetimeindex
    o_d['Datetime'] = o_d.index.to_timestamp()
    year_month =o_d['Datetime']
    o_d['Year'], o_d['Month'],o_d['Day'] = o_d['Datetime'].dt.year, o_d['Datetime'].dt.month,o_d['Datetime'].dt.day
    o_d['Datetime'] = pd.DatetimeIndex(year_month).floor('d')
    o_d = o_d[['Datetime','Year','Month','Day','Data']] 
    return o_d

def combine_stations_Data(sd):
    data=sd['lat']+sd['laval']+sd['lon']+sd['lovol']
    return data 


rainfall_data = pd.DataFrame({
        'S.No.':[1,2,3,4,5],
        'YEAR':['1967-68','1968-69','1969-70','1970-71','1971-72'],
        'JUNE':['77.19','25.0','236.71','108.38','6.19'],
        'JULY':['76.19','435.48','26.71','138.38','9.19'],
        'AUGUST':['75.19','415.48','226.71','78.38','3.19']
        })
#years = rainfall_data['YEAR'].tolist()
#rainfall_data.set_index('YEAR', inplace = 'True')
#months = rainfall_data[['JUNE', 'JULY', 'AUGUST']].mean(axis = 1).tolist()
months = ['JUNE', 'JULY', 'AUGUST']

new_rainfall_data = timeseries(rainfall_data,months)  

new_rainfall_data.to_csv('./StationsData/test.csv', index=False ,sep=';') 
Binod Bhandary
  • 422
  • 1
  • 5
  • 22