1

I have a data frame with a weekday column that contains the name of the weekdays and a time column that contains hours on these days. How can I combine these 2 columns, so they can be also sortable?

I have tried the string version but it is not sortable based on weekdays and hours.

This is the sample table how it looks like.

weekday time
Monday 12:00
Monday 13:00
Tuesday 20:00
Friday 10:00

This is what I want to get.

weekday_hours
Monday 12:00
Monday 13:00
Tuesday 20:00
Friday 10:00
  • Does this answer your question? [Combine two columns of text in pandas dataframe](https://stackoverflow.com/questions/19377969/combine-two-columns-of-text-in-pandas-dataframe) – deadshot Apr 09 '21 at 21:58
  • @deadshot op is not asking to just concat to two columns, they also wanted them sorted. – Umar.H Apr 09 '21 at 22:51

3 Answers3

0

Asumming that df is your initial dataframe

import json
datas = json.loads(df.to_json(orient="records"))
final_data = {"weekday_hours": []}
for data in datas:
    final_data["weekday_hours"].append(data['weekday'] + ' ' + data['time'])
final_df = pd.DataFrame(final_data)
final_df

Ouptput:

enter image description here

Kumar Shivam Ray
  • 337
  • 2
  • 10
0

you first need to create a datetime object of 7 days at an hourly level to sort by. In a normal Data warehousing world you normally have a calendar and a time dimension with all the different representation of your date data that you can merge and sort by, this is an adaptation of that methodology.

import pandas as pd

df1 = pd.DataFrame({'date' : pd.date_range('01 Jan 2021', '08 Jan 2021',freq='H')})
df1['str_date'] = df1['date'].dt.strftime('%A %H:%M')

print(df1.head(5))

    date      str_date
0 2021-01-01 00:00:00  Friday 00:00
1 2021-01-01 01:00:00  Friday 01:00
2 2021-01-01 02:00:00  Friday 02:00
3 2021-01-01 03:00:00  Friday 03:00
4 2021-01-01 04:00:00  Friday 04:00

Then create your column to merge on.

df['str_date'] = df['weekday'] + ' ' +  df['time'] 

df2 = pd.merge(df[['str_date']],df1,on=['str_date'],how='left')\
                            .sort_values('date').drop('date',1)


print(df2)

 str_date
3   Friday 10:00
0   Monday 12:00
1   Monday 13:00
2  Tuesday 20:00
Umar.H
  • 22,559
  • 7
  • 39
  • 74
0

Based on my understanding of the question, you want a single column, "weekday_hours," but you also want to be able to sort the data based on this column. This is a bit tricky because "Monday" doesn't provide enough information to define a valid datetime. Parsing using pd.to_datetime(df['weekday_hours'], format='%A %H:%M' for example, will return 1900-01-01 <hour::minute::second> if given just weekday and time. When sorted, this only sorts by time.

One workaround is to use dateutil to parse the dates. In lieu of a date, it will return the next date corresponding to the day of the week. For example, today (9 April 2021) dateutil.parser.parse('Friday 10:00') returns datetime.datetime(2021, 4, 9, 10, 0) and dateutil.parser.parse('Monday 10:00') returns datetime.datetime(2021, 4, 12, 10, 0). Therefore, we need to set the "default" date to something corresponding to our "first" day of the week. Here is an example starting with unsorted dates:

    import datetime
import dateutil
import pandas as pd

weekdays = ['Friday', 'Monday', 'Monday', 'Tuesday']
times = ['10:00', '13:00', '12:00', '20:00', ]
df = pd.DataFrame({'weekday' : weekdays, 'time' : times})

df2 = pd.DataFrame()
df2['weekday_hours'] = df[['weekday', 'time']].agg(' '.join, axis=1)

amonday = datetime.datetime(2021, 2, 1, 0, 0) # assuming week starts monday
sorter = lambda t: [dateutil.parser.parse(ti, default=amonday) for ti in t]
print(df2.sort_values('weekday_hours', key=sorter))

Produces the output:

  weekday_hours
2   Monday 12:00
1   Monday 13:00
3  Tuesday 20:00
0   Friday 10:00

Note there are probably more computationaly efficient ways if you are working with a lot of data, but this should illustrate the idea of a sortable weekday/time pair.

gil
  • 81
  • 4