1

In below example i have data of movies :

I want to split the title column values into 2 new column i.e one of the new column ( " movie title ") will take = toy story as row value and another new column ("year) will take =1995 as row value.

how to perform this operation on whole dataframe?

                            title  \
0                    Toy Story (1995)   
1                      Jumanji (1995)   
2             Grumpier Old Men (1995)   
3            Waiting to Exhale (1995)   
4  Father of the Bride Part II (1995) 
rafaelc
  • 57,686
  • 15
  • 58
  • 82
Vishal Suryavanshi
  • 355
  • 1
  • 4
  • 15

5 Answers5

1

Try to use str accessor with split using a regex that looks for a space and open paranthesis or close paranthesis:

df.title.str.split('\s\(|\)',expand=True)
  .drop(2, axis=1)
  .set_axis(['movies','year'], inplace=False, axis=1)

Output:

                        movies  year
0                    Toy Story  1995
1                      Jumanji  1995
2             Grumpier Old Men  1995
3            Waiting to Exhale  1995
4  Father of the Bride Part II  1995
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
1

If you don't want a regex solution,

df.title.str.rstrip(')').str.split('(').apply(pd.Series)


    0                           1
0   Toy Story                   1995
1   Jumanji                     1995
2   Grumpier Old Men            1995
3   Waiting to Exhale           1995
4   Father of the Bride Part II 1995
rafaelc
  • 57,686
  • 15
  • 58
  • 82
0

You can simply create a new dataframe (df_new) that splits the string in the title column of the old dataframe (df) and pass that as a list to the new dataframe constructor:

df_new = pd.DataFrame(df['title'].str.split(r' (?=\()').tolist(), columns=['movie title','year'])

Yields:

                   movie title    year
0                    Toy Story  (1995)
1                      Jumanji  (1995)
2             Grumpier Old Men  (1995)
3            Waiting to Exhale  (1995)
4  Father of the Bride Part II  (1995)
rahlf23
  • 8,869
  • 4
  • 24
  • 54
0

Using regex:

df[['movie_title', 'year']] = df.title.str.extract('(.*)\s\((\d+)', expand=True)

Sample Data:

df = pd.DataFrame({'title': 
                  ['Toy Story (1995)', 'Jumanji (1995)', 'Grumpier Old Men (1995)',
                   'Waiting to Exhale (1995)', 'Father of the Bride Part II (1995)', 
                   'Hello (Goodbye) (1995)'
                  ]})

df[['movie_title', 'year']] = df.title.str.extract('(.*)\s\((\d+)', expand=True)

Output:

                                title                  movie_title  year
0                    Toy Story (1995)                    Toy Story  1995
1                      Jumanji (1995)                      Jumanji  1995
2             Grumpier Old Men (1995)             Grumpier Old Men  1995
3            Waiting to Exhale (1995)            Waiting to Exhale  1995
4  Father of the Bride Part II (1995)  Father of the Bride Part II  1995
5              Hello (Goodbye) (1995)              Hello (Goodbye)  1995

The regular expression we use is: '(.*)\s\((\d+)'.

The first part of the expression is: (.*). The parenthesis indicate it's a capturing group, and what's inside the parenthesis indicates what we want to capture. In this case .* indicates that we greedily want to capture everything. After the capturing group, we have \s\( which is literally interpreted as a space folllwed by an open parenthesis, so ' ('. Because the capturing group before this is greedy, we will capture everything up until the final ' (' if there are multiple such matches. (for instance see how it captures Hello (Goodbye) (1995) properly.

Finally, we add a second capturing group with \d+ as the thing we want to capture, which captures all of the numbers after the final ' (' in your title, which for us will be the 4 digit year.

ALollz
  • 57,915
  • 7
  • 66
  • 89
0
df.apply(lambda x: x.str.split('('),axis=1)
df2 = pd.DataFrame({'movie title':df['title'].apply(lambda x: x[0][:-1]), 'year':df['title'].apply(lambda x: x[1][:-1])})
print(df2)

Result

                    movie title year
0                    Toy Story  1995
1                      Jumanji  1995
2             Grumpier Old Men  1995
3            Waiting to Exhale  1995
4  Father of the Bride Part II  1995
Rushabh Mehta
  • 1,529
  • 1
  • 13
  • 29