1

Consider the dataframe from the following dictionary. 2 of the columns are 'datetime', 'date_at_which_value_is_needed'. I want to create a new column that contains those values of datetime column as a list/series which have the same date as the value in the 'date_at_which_value_is_needed' column. Is there any way to do this without loops?

  {'datetime': {667: Timestamp('2019-11-08 10:00:00+0000', tz='UTC'),
      673: Timestamp('2019-11-08 16:00:00+0000', tz='UTC'),
      679: Timestamp('2019-11-08 22:00:00+0000', tz='UTC'),
      685: Timestamp('2019-11-09 04:00:00+0000', tz='UTC'),
      691: Timestamp('2019-11-11 10:00:00+0000', tz='UTC'),
      697: Timestamp('2019-11-11 16:00:00+0000', tz='UTC'),
      703: Timestamp('2019-11-11 22:00:00+0000', tz='UTC'),
      709: Timestamp('2019-11-12 04:00:00+0000', tz='UTC'),
      715: Timestamp('2019-11-12 10:00:00+0000', tz='UTC'),
      721: Timestamp('2019-11-12 16:00:00+0000', tz='UTC'),
      727: Timestamp('2019-11-12 22:00:00+0000', tz='UTC'),
      733: Timestamp('2019-11-13 04:00:00+0000', tz='UTC'),
      739: Timestamp('2019-11-13 10:00:00+0000', tz='UTC'),
      745: Timestamp('2019-11-13 16:00:00+0000', tz='UTC'),
      751: Timestamp('2019-11-13 22:00:00+0000', tz='UTC'),
      757: Timestamp('2019-11-14 04:00:00+0000', tz='UTC'),
      763: Timestamp('2019-11-14 10:00:00+0000', tz='UTC'),
      769: Timestamp('2019-11-14 16:00:00+0000', tz='UTC'),
      775: Timestamp('2019-11-14 22:00:00+0000', tz='UTC'),
      780: Timestamp('2019-11-15 04:00:00+0000', tz='UTC')},
     'date_at_which_value_is_needed': {667: Timestamp('2019-11-05 00:00:00+0000', tz='UTC'),
      673: Timestamp('2019-11-05 00:00:00+0000', tz='UTC'),
      679: Timestamp('2019-11-05 00:00:00+0000', tz='UTC'),
      685: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      691: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      697: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      703: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      709: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      715: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      721: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      727: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      733: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      739: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      745: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      751: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      757: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      763: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      769: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      775: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      780: Timestamp('2019-11-12 00:00:00+0000', tz='UTC')},
     'c': {667: 64.6475,
      673: 65.005,
      679: 65.0075,
      685: 65.0075,
      691: 65.0225,
      697: 65.5875,
      703: 65.6,
      709: 65.5625,
      715: 65.355,
      721: 65.475,
      727: 65.425,
      733: 65.0375,
      739: 65.9017,
      745: 66.1875,
      751: 66.15,
      757: 66.075,
      763: 65.695,
      769: 65.625,
      775: 65.66,
      780: 65.9525}}

For example, for the last row (index 780), new column would contain the list:

[Timestamp('2019-11-12 04:00:00+0000', tz='UTC'), Timestamp('2019-11-12 10:00:00+0000', tz='UTC'), Timestamp('2019-11-12 16:00:00+0000', tz='UTC'), Timestamp('2019-11-12 22:00:00+0000', tz='UTC')]
Umang Garg
  • 95
  • 8
  • Does this answer your question? [How to group dataframe rows into list in pandas groupby](https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby) – sushanth Dec 28 '20 at 09:17
  • Can you give an example for one or two rows? Not really clear of what you expect to see in that new column – Ignacio Alorre Dec 28 '20 at 09:28
  • @IgnacioAlorre, for e.g., in the new column the last row should be the list/series: [Timestamp('2019-11-12 04:00:00+0000', tz='UTC'), Timestamp('2019-11-12 10:00:00+0000', tz='UTC'), Timestamp('2019-11-12 16:00:00+0000', tz='UTC'), Timestamp('2019-11-12 22:00:00+0000', tz='UTC')] – Umang Garg Dec 28 '20 at 09:30
  • @sushanth can you please tell the command I should use? that question has multiple answers, i tried one of them but it gave some unexpected results – Umang Garg Dec 28 '20 at 09:32

1 Answers1

1

Try this:

import pandas as pd
from pandas import Timestamp

data = {'datetime': {667: Timestamp('2019-11-08 10:00:00+0000', tz='UTC'),
      673: Timestamp('2019-11-08 16:00:00+0000', tz='UTC'),
      679: Timestamp('2019-11-08 22:00:00+0000', tz='UTC'),
      685: Timestamp('2019-11-09 04:00:00+0000', tz='UTC'),
      691: Timestamp('2019-11-11 10:00:00+0000', tz='UTC'),
      697: Timestamp('2019-11-11 16:00:00+0000', tz='UTC'),
      703: Timestamp('2019-11-11 22:00:00+0000', tz='UTC'),
      709: Timestamp('2019-11-12 04:00:00+0000', tz='UTC'),
      715: Timestamp('2019-11-12 10:00:00+0000', tz='UTC'),
      721: Timestamp('2019-11-12 16:00:00+0000', tz='UTC'),
      727: Timestamp('2019-11-12 22:00:00+0000', tz='UTC'),
      733: Timestamp('2019-11-13 04:00:00+0000', tz='UTC'),
      739: Timestamp('2019-11-13 10:00:00+0000', tz='UTC'),
      745: Timestamp('2019-11-13 16:00:00+0000', tz='UTC'),
      751: Timestamp('2019-11-13 22:00:00+0000', tz='UTC'),
      757: Timestamp('2019-11-14 04:00:00+0000', tz='UTC'),
      763: Timestamp('2019-11-14 10:00:00+0000', tz='UTC'),
      769: Timestamp('2019-11-14 16:00:00+0000', tz='UTC'),
      775: Timestamp('2019-11-14 22:00:00+0000', tz='UTC'),
      780: Timestamp('2019-11-15 04:00:00+0000', tz='UTC')},
     'date_at_which_value_is_needed': {667: Timestamp('2019-11-05 00:00:00+0000', tz='UTC'),
      673: Timestamp('2019-11-05 00:00:00+0000', tz='UTC'),
      679: Timestamp('2019-11-05 00:00:00+0000', tz='UTC'),
      685: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      691: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      697: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      703: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      709: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      715: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      721: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      727: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      733: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      739: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      745: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      751: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      757: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      763: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      769: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      775: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      780: Timestamp('2019-11-12 00:00:00+0000', tz='UTC')},
     'c': {667: 64.6475,
      673: 65.005,
      679: 65.0075,
      685: 65.0075,
      691: 65.0225,
      697: 65.5875,
      703: 65.6,
      709: 65.5625,
      715: 65.355,
      721: 65.475,
      727: 65.425,
      733: 65.0375,
      739: 65.9017,
      745: 66.1875,
      751: 66.15,
      757: 66.075,
      763: 65.695,
      769: 65.625,
      775: 65.66,
      780: 65.9525}}

# Converting the dictionaries into a dataframe    
datesDf = pd.DataFrame.from_dict(data)
# Selecting the date part of the datetime column
datesDf['date'] = datesDf['datetime'].apply(lambda x: x.date())
datesDf['date_needed'] = datesDf['date_at_which_value_is_needed'].apply(lambda x: x.date())

# Creating a new dataframe grouping dates by datetime
datesGrouped = datesDf.groupby('date')['datetime'].apply(list).to_frame()

# Joining original dataframe with new one after the grouping
result = datesDf.merge(datesGrouped, how='left', left_on='date_needed', right_on='date')

# Formating the result
result = result.drop(['date', 'date_needed'], axis = 1).rename(columns={"datetime_x": "datetime", "datetime_y": "datetime_col"})
Ignacio Alorre
  • 7,307
  • 8
  • 57
  • 94