2

movies

| Movies | Release Date |
| -------- | -------------- |
| Star Wars: Episode VII - The Force Awakens (2015) | December 16, 2015            |
| Avengers: Endgame (2019   | April 24, 2019               |

I am trying to have a new column and use split to have the year.

import pandas as pd
df = pd.DataFrame({'Movies': ['Star Wars: Episode VII - The Force Awakens (2015)', 'Avengers: Endgame (2019'], 
                   'Release Date': ['December 16, 2015', 'April 24, 2019' ]})    
movies["year"]=0
movies["year"]= movies["Release Date"].str.split(",")[1]
movies["year"]

TO BE

| Movies | year |
| -------- | -------------- |
| Star Wars: Episode VII - The Force Awakens (2015) | 2015            |
| Avengers: Endgame (2019)   | 2019            |

BUT

> ValueError: Length of values does not match length of index 
bo_
  • 81
  • 1
  • 8
  • 1
    Please check [ask] and post [mre] as well as full traceback. Also https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – buran Jan 23 '22 at 17:02

2 Answers2

2

Using str.extract we can target the 4 digit year:

df["year"] = df["Release Date"].str.extract(r'\b(\d{4})\b')
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

Explanation

  1. movies["Release Date"].str.split(",") returns a series of of the lists returns by split()

  2. movies["Release Date"].str.split(",")[1] return the second element of this series.

    This is obviouly not what you want.

Solutions

  • Keep using pandas.str.split. but then a function that gets the 2nd item of the series rows for example:

    movies["Release Date"].str.split(",").map(lambda x: x[1])

  • Do something different as suggestted by @Tim Bielgeleisen

hpchavaz
  • 1,368
  • 10
  • 16