1

I have created the following df with the following code:

df = pd.read_table('https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/Wind_Stats/wind.data', sep = "\s+", parse_dates = [[0,1,2]]) 

If we run the following command:

type(df['Yr_Mo_Dy'][0])

We'll see that the observations under ['Yr_Mo_Dy'] are of pandas._libs.tslibs.timestamps.Timestamp format.

What I am trying to do is the following: whenever I see a year >= 2061 (['Yr_Mo_Dy']), I want to subtract -100, otherwise I just keep the year and continue with the iteration.

I have tried the following code:

for i in list(range(df.shape[0])):
    # assign all the observations under df['Yr_Mo_Dy'] to ts
    ts = df['Yr_Mo_Dy'][i]

    if df['Yr_Mo_Dy'][i].year >=2061:
        # replace the year in ts by year - 100
        ts.replace(year=df['Yr_Mo_Dy'][i].year - 100)
    else:
        continue

But the loop does nothing. I feel it has something to do with the variable assignment ts = df['Yr_Mo_Dy'][i]. yet I cannot figure another way of getting this done.

I am trying to assign a variable after each loop iteration considering the answer I saw in this post.

jpp
  • 159,742
  • 34
  • 281
  • 339
BCArg
  • 2,094
  • 2
  • 19
  • 37

1 Answers1

0

You should aim to avoid manual loops for vectorisable operations.

In this case, you can use numpy.where to create a conditional series:

df = pd.DataFrame({'A': pd.to_datetime(['2018-01-01', '2080-11-30',
                                        '1955-04-05', '2075-10-09'])})

df['B'] = np.where(df['A'].dt.year >= 2061,
                   df['A'] - pd.DateOffset(years=100), df['A'])

print(df)

           A          B
0 2018-01-01 2018-01-01
1 2080-11-30 1980-11-30
2 1955-04-05 1955-04-05
3 2075-10-09 1975-10-09
jpp
  • 159,742
  • 34
  • 281
  • 339