1

I have an Excel table with the following data

Please note that I have a single column where the date, month, and time are given in the following format.

I wish to sort out the rows with respect to the date and time (i.e Jan-1-1.0, Jan-2-2.0, Jan-1-3.0) and looking for ways to do in Python Pandas DataFrame. (dates are in French)

Kindly provide your suggestions.

Date-heure

Vendredi 03 novembre 10.0
Vendredi 03 novembre 5.0
Vendredi 03 novembre 18.0
Vendredi 03 novembre 24.0
Samedi 04 novembre 1.0
Samedi 04 novembre 2.0
Samedi 04 novembre 4.0
Samedi 04 novembre 5.0
Samedi 04 novembre 7.0
Samedi 04 novembre 13.0
Samedi 04 novembre 21.0
Vendredi 20 avril 1.0
Dimanche 05 novembre 2.0
Dimanche 05 novembre 8.0

Thank you for your prompt response. In my excel the cell is of Date. And when I loaded as a DataFrame, it shows me a datatype as

pandas.core.series.Series

And I just could not sort it out. Also please note I have a time as well in the same cell.

Presenting you all the dtypes here as below;

Date_heure                                                      object
Heure                                                            int64
Industrie (MW)                                                   int64
Tertiaire Chauffage (MW)                                         int64
Tertiaire Climatisation (MW)                                     int64
Tertiaire Autres usages (MW)                                     int64
Résidentiel Chauffage (MW)                                       int64
Résidentiel Eau chaude (MW)                                      int64
dtype: object

Thank you.

mari
  • 167
  • 4
  • 15
  • Can you show what happens when you load this into a df? The `dtype` should be sniffed correctly using `pd.read_excel` into `datetime` so it should be sortable unless these are really strings – EdChum Sep 12 '16 at 10:41
  • Thank you for your prompt response. In my excel the cell is of Date. And when I loaded as a DataFrame, it shows me a datatype as pandas.core.series.Series – mari Sep 12 '16 at 10:58
  • edit your question with the new information what does `df.dtypes` show? – EdChum Sep 12 '16 at 10:58
  • I have just added the output here EdChum. Thank you. – mari Sep 12 '16 at 11:02
  • It looks your dates are in fact string, I don't know if the hour component will throw it off but you can use `dateparser` module to parse the strings: http://stackoverflow.com/questions/26294333/parse-french-date-in-python so after importing try `df['Date_heure'].apply(dateparser.parse)` – EdChum Sep 12 '16 at 11:06
  • what should be the date here: `Vendredi 03 novembre 24.0` as this looks like it should be '3rd November 00:00' instead of 4th November? – EdChum Sep 12 '16 at 11:27

1 Answers1

0

OK you can use dateparser to parse your strings and then construct a TimedeltaIndex to add the hour component:

In [36]:
import dateparser
t="""Date-heure
Vendredi 03 novembre 10.0
Vendredi 03 novembre 5.0
Vendredi 03 novembre 18.0
Vendredi 03 novembre 24.0
Samedi 04 novembre 1.0
Samedi 04 novembre 2.0
Samedi 04 novembre 4.0
Samedi 04 novembre 5.0
Samedi 04 novembre 7.0
Samedi 04 novembre 13.0
Samedi 04 novembre 21.0
Vendredi 20 avril 1.0
Dimanche 05 novembre 2.0
Dimanche 05 novembre 8.0"""
df = pd.read_csv(io.StringIO(t))
df['date-time'] = df['Date-heure'].str.split().str[:-1].str.join(' ').apply(dateparser.parse) + pd.TimedeltaIndex((df['Date-heure'].str.rsplit().str[-1]).astype(float), unit='H')
df

Out[36]:
                   Date-heure           date-time
0   Vendredi 03 novembre 10.0 2016-11-03 10:00:00
1    Vendredi 03 novembre 5.0 2016-11-03 05:00:00
2   Vendredi 03 novembre 18.0 2016-11-03 18:00:00
3   Vendredi 03 novembre 24.0 2016-11-04 00:00:00
4      Samedi 04 novembre 1.0 2016-11-04 01:00:00
5      Samedi 04 novembre 2.0 2016-11-04 02:00:00
6      Samedi 04 novembre 4.0 2016-11-04 04:00:00
7      Samedi 04 novembre 5.0 2016-11-04 05:00:00
8      Samedi 04 novembre 7.0 2016-11-04 07:00:00
9     Samedi 04 novembre 13.0 2016-11-04 13:00:00
10    Samedi 04 novembre 21.0 2016-11-04 21:00:00
11      Vendredi 20 avril 1.0 2016-04-20 01:00:00
12   Dimanche 05 novembre 2.0 2016-11-05 02:00:00
13   Dimanche 05 novembre 8.0 2016-11-05 08:00:00

So this:

df['Date-heure'].str.split().str[:-1].str.join(' ').apply(dateparser.parse) + pd.TimedeltaIndex((df['Date-heure'].str.rsplit().str[-1]).astype(float), unit='H')

is the line that should interest you, here I call apply on your strings to apply dateparser.parse but this will only give you the date as it doesn't understand the float value, so I then rsplit the string to get the hour and cast to float and then construct a timedeltaindex.

After which I can use sort_values to sort the df:

In [37]:
df.sort_values('date-time')

Out[37]:
                   Date-heure           date-time
11      Vendredi 20 avril 1.0 2016-04-20 01:00:00
1    Vendredi 03 novembre 5.0 2016-11-03 05:00:00
0   Vendredi 03 novembre 10.0 2016-11-03 10:00:00
2   Vendredi 03 novembre 18.0 2016-11-03 18:00:00
3   Vendredi 03 novembre 24.0 2016-11-04 00:00:00
4      Samedi 04 novembre 1.0 2016-11-04 01:00:00
5      Samedi 04 novembre 2.0 2016-11-04 02:00:00
6      Samedi 04 novembre 4.0 2016-11-04 04:00:00
7      Samedi 04 novembre 5.0 2016-11-04 05:00:00
8      Samedi 04 novembre 7.0 2016-11-04 07:00:00
9     Samedi 04 novembre 13.0 2016-11-04 13:00:00
10    Samedi 04 novembre 21.0 2016-11-04 21:00:00
12   Dimanche 05 novembre 2.0 2016-11-05 02:00:00
13   Dimanche 05 novembre 8.0 2016-11-05 08:00:00
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Dear EdChum, Thank you very much for your efforts in trying to help me. Now I am able to sort the data as i exactly wished. Thank you again for your time and efforts. They helped me on time. – mari Sep 12 '16 at 12:06