0

I have a Code which I run in Jupyter Notebook,

This is the resulting DataFrame output which I get :-

    LOCATION        DATE   DAKOTA HURRI SPITFIRE
MyIdx                   
176 Duxford     10-Jul-2004     D   H   S
177 Cirencester 10-Jul-2004     D   H   S
178 Brize Nortn 10-Jul-2004     D   H   S
74  Shrivenham  10-Jun-2004     D   H   S
257 Campbletown 15-Aug-2004     D   --  S
258 Sunderland  15-Aug-2004     D   --  S
261 Scampton    15-Aug-2004     D   --  S
200 Fairford    15-Jul-2004     D   --  SS
22  Tilford     15-May-2004     D   --  S
23  Abingdon    15-May-2004     D   --  S
24  Hyde Heath  15-May-2004     D   --  S

Could I Moderator tidy the output layout for me, if that is okay ?

These are the two key parts of the Code I am filtering Rows by Date with :-

(df3['DATE'].str.contains('-10$|15$'))  

and

display.sort_values(by=['DATE'])

The First line of Code, is to filter the DataFrame Row Output by two days the 10th of the Month and the 15th.

it correctly outputs the earliest days in the DataFrame Output first, i.e. 10 before 15, but not in the month order I want :-

I want 10th June 2004 first then the 10th of July/s then the 15th of May's then the 15th of July Rows etc. How do I modify that line of Code, so that I can filter to get that order, without changing the index position of the Rows via code, which I know how to do ?

I mean add something to either lines of Code, so that the Earlier month with an the earlier day, is shown 'favoured' before the later month with the same day ? i.e. 10-Jun-2004 is shown before 10-Jul-2004 , 15-May-2004 is shown before 15-Jul-2004 Rows then. But still dates with day 10 , showing before day 15 Rows.

So the Rows shown, are in the Date Order Like this :-

10-Jun-2004
10-Jul-2004
15-May-2004
15-Jul-2004
15-Aug-2004

The Date output is from this line of Code :-

display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%b-%Y')

Any help I could be given, would be much appreciated

Best Regards

Eddie Winch

Edward Winch
  • 47
  • 2
  • 9
  • Your sample data is difficult to parse, please see [How to provide a reproducible copy of your DataFrame using `df.head(10).to_clipboard()`](https://stackoverflow.com/questions/52413246), then **[edit] your question**, and paste the clipboard into a code block. – Trenton McKinney Aug 20 '20 at 23:12

1 Answers1

0

Consider breaking out columns for day, month, and year separately and then sort on those as needed. It'll be easier to use the numeric month for sorting (you could keep the displayed date as you have it if that's how you want to display it).

Like:

import pandas as pd
data = [
{"Name": "Alice", "date": "15-May-2004", "Rating": 55},
{"Name": "Bob", "date": "10-Jun-2004", "Rating": 11},
{"Name": "Chanel", "date": "15-Aug-2004", "Rating": 33},
{"Name": "Del", "date": "10-Jul-2004", "Rating": 44},
{"Name": "Erin", "date": "15-Jul-2004", "Rating": 22},
]
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df['date_day'] = df.apply(lambda row: row.date.day, axis=1)
df['date_month'] = df.apply(lambda row: row.date.month, axis=1)
df['date_year'] = df.apply(lambda row: row.date.year, axis=1)
df = df.sort_values(by=["date_day", "date_month"])

result:

    Name    date        Rating  date_day    date_month  date_year
1   Bob     2004-06-10  11      10          6           2004
3   Del     2004-07-10  44      10          7           2004
0   Alice   2004-05-15  55      15          5           2004
4   Erin    2004-07-15  22      15          7           2004
2   Chanel  2004-08-15  33      15          8           2004

Another approach without adding columns is to use the key argument to sort_values to get the sorting you want:

df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by="date", key=lambda col: 100 * col.dt.day + col.dt.month)
Josh
  • 136
  • 6
  • Hi Josh, many thanks for your reply, The Output I get, is actually from filtering a Web Page Dataset via Jupyter Notebook in Pandas, rather than my own Dataset. But many thanks four your helpful info, very much appreciated. – Edward Winch Aug 20 '20 at 23:47
  • What could I add to the Line of Code :- display.sort_values(by=['DATE']), i.e. with a regular expression i.e. using .where for example ? etc to achieve what I want ? – Edward Winch Aug 21 '20 at 00:46
  • I'm not sure if it matters how you get the DaaFrame - if you have access to call methods on it, you should be able to use the approach above. I may be missing something there. I've amended the answer with another approach if you want to avoid adding columns to the dataframe. – Josh Aug 21 '20 at 06:15
  • Hi Josh, I get the Error :- TypeError: sort_values() got an unexpected keyword argument 'key' when I run my Code, with that Code Line. – Edward Winch Aug 21 '20 at 10:52
  • You'll need to provide more information, potentially as a different question. I believe the answer above addresses the question of "how to do a specific sort based on day then month" in pandas. – Josh Aug 21 '20 at 13:40
  • Hi Josh, Many apologies, there was nothing wrong, with your lines of Code. I was using an early version of Pandas, and you cant use sort_values with a key Argument, in versions of Pandas below v1.1. So I upgraded to the latest Version, and now I get the correct DataFrame Output, when I use your lines of Code. Many thanks for all your help, you have sorted it ! )) Best Regards Eddie Winch – Edward Winch Aug 21 '20 at 23:33
  • Hi Edward, happy to hear that. If the answer worked for you, please feel free to mark it accepted! Cheers – Josh Aug 25 '20 at 05:01
  • Hi Josh, I have marked your answer as accepted, many thanks once again for all your help. Regards Eddie )) – Edward Winch Aug 30 '20 at 20:33