2

I have a dataframe jobs screenshot of dataframe

I need to add a new column ‘year’ to jobs data frame. This column should contain the corresponding year for each post_date (which is already a column). For example: for post_date value 2017-08-16 ‘year’ value should be 2017.

I am unsure how to insert a new column while also pulling data from a pre-existing column.

jo.jersey
  • 93
  • 4
  • 1
    The answer to this will depend on how you have the date value stored. Can you share a reproducible example? To add a new column (not dealing with the date, but generally): jobs['newcolname'] = jobs['post_date'] + some calculations – Kris Nov 20 '19 at 01:37
  • first DataFrame will have the RAW data and after that you need to create another DataFrame with additional column 'year' and from the first DataFrame read column post_date only and split by "-" from there 0 index will give you the year that could be added to the newly created DataFrame – Shankar Saran Singh Nov 20 '19 at 01:41

3 Answers3

2

Use dt.year:

jobs['year'] = pd.to_datetime(jobs['post_date'], errors='coerce').dt.year
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
1

I would begin by transforming the column post_date into date format. After doing this, you could use a simple function to extract the year.

jobs["post_date"] =pd.to_datetime(jobs["post_date"])

should be enough to change it into a datetime type. If it doesnt you should use datetime strpstring in order to tell python what is the specific format of the "post_date" column, so it to read it as a date. After that do the following:

jobs["year"] =jobs["post_date"].dt.year
0

If I understand your question correctly, you want to add a new column of values of years to the existing dataframe from a column in your current dataframe. For extracting only the year values, you need to do some calculations first. You can make use of pandas datetime.datetime and extract only the values of the year in your Post_date column. Have a look at this or this. For storing these year values, you can simply do this:

jobs['year'] = jobs['post_date'].dt.year
Dulaj Kulathunga
  • 1,248
  • 2
  • 9
  • 19