1

You can use this to create the dataframe:

xyz = pd.DataFrame({'release' : ['7 June 2013', '2012', '31 January 2013',
                                 'February 2008', '17 June 2014', '2013']})

I am trying to split the data and save, them into 3 columns named "day, month and year", using this command:

dataframe[['day','month','year']] = dataframe['release'].str.rsplit(expand=True) 

The resulting dataframe is : dataframe

As you can see, that it works perfectly when it gets 3 strings, but whenever it is getting less then 3 strings, it saves the data at the wrong place.

I have tried split and rsplit, both are giving the same result. Any solution to get the data at the right place?

The last one is year and it is present in every condition , it should be the first one to be saved and then month if it is present otherwise nothing and same way the day should be stored.

Zero
  • 74,117
  • 18
  • 147
  • 154

2 Answers2

2

You could

In [17]: dataframe[['year', 'month', 'day']] = dataframe['release'].apply(
                                                    lambda x: pd.Series(x.split()[::-1]))
In [18]: dataframe
Out[18]:
           release  year     month  day
0      7 June 2013  2013      June    7
1             2012  2012       NaN  NaN
2  31 January 2013  2013   January   31
3    February 2008  2008  February  NaN
4     17 June 2014  2014      June   17
5             2013  2013       NaN  NaN
Zero
  • 74,117
  • 18
  • 147
  • 154
  • Thanks for the answer, it works perfectly on small datatsets. But when I am using it with a large dataset ( 2 Million records), it takes a lot of time and memory. – user2965412 Oct 16 '16 at 21:04
0

Try reversing the result.

dataframe[['year','month','day']] = dataframe['release'].str.rsplit(expand=True).reverse()