Replace year on pandas dataframe with variable of Timestamp format

Question

I have created the following df with the following code:

df = pd.read_table('https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/Wind_Stats/wind.data', sep = "\s+", parse_dates = [[0,1,2]])

If we run the following command:

type(df['Yr_Mo_Dy'][0])

We'll see that the observations under ['Yr_Mo_Dy'] are of pandas._libs.tslibs.timestamps.Timestamp format.

What I am trying to do is the following: whenever I see a year >= 2061 (['Yr_Mo_Dy']), I want to subtract -100, otherwise I just keep the year and continue with the iteration.

I have tried the following code:

for i in list(range(df.shape[0])):
    # assign all the observations under df['Yr_Mo_Dy'] to ts
    ts = df['Yr_Mo_Dy'][i]

    if df['Yr_Mo_Dy'][i].year >=2061:
        # replace the year in ts by year - 100
        ts.replace(year=df['Yr_Mo_Dy'][i].year - 100)
    else:
        continue

But the loop does nothing. I feel it has something to do with the variable assignment ts = df['Yr_Mo_Dy'][i]. yet I cannot figure another way of getting this done.

I am trying to assign a variable after each loop iteration considering the answer I saw in this post.

score 0 · Accepted Answer · answered Sep 26 '18 at 13:40

You should aim to avoid manual loops for vectorisable operations.

In this case, you can use numpy.where to create a conditional series:

df = pd.DataFrame({'A': pd.to_datetime(['2018-01-01', '2080-11-30',
                                        '1955-04-05', '2075-10-09'])})

df['B'] = np.where(df['A'].dt.year >= 2061,
                   df['A'] - pd.DateOffset(years=100), df['A'])

print(df)

           A          B
0 2018-01-01 2018-01-01
1 2080-11-30 1980-11-30
2 1955-04-05 1955-04-05
3 2075-10-09 1975-10-09

clever solution, just did not know the `.DateOffset()` method — BCArg, Sep 26 '18 at 13:55

Replace year on pandas dataframe with variable of Timestamp format

1 Answers1