0

Please, I have a pandas dataframe containing intraday data for 2 stocks. The index is a time series sampled by minute (i.e. 1/1/2017 9:30, 1/1/2017 9:31, 1/1/2017 9:32, ...). There are only two columns "Price A", "Price B". Total number of rows = 52000. I need to create a new column in which I store the 9.30 am value for every day. Assuming for 1/1/2017, the 9:30 am "Price A" is 150, I would need to store this value in a new column called "Open A" for every row that has the same day. For example:enter image description here

Sample input:

                     Price A  Price B
date                                 
2017-01-01 09:30:00      150        1
2017-01-01 09:31:00      153        2
2017-01-01 09:31:00      149        3
2017-01-01 09:31:00      151        4
2017-02-01 09:30:00      145        1
2017-02-01 09:31:00      139        2
2017-02-01 09:31:00      142        3
2017-02-01 09:31:00      149        4

I tried to simply use:

for ind in df.index: df['Open A'][ind] = 2

just to make a test but this seems to be taking forever. I also tried to read what's available here: How to iterate over rows in a DataFrame in Pandas? but it doesn't seem to be of help. does anybody have a suggestion? Thanks

cs95
  • 379,657
  • 97
  • 704
  • 746
Andrea
  • 113
  • 1
  • 4
  • 10
  • 1
    I'm not 100% sure I follow. But, you may want a `df.groupby(pd.Grouper(freq='1D'))['Price A'].transform('first')` – cs95 Dec 29 '17 at 14:55
  • Do you mind converting this picture into some text that I can copy and run in a terminal to whip up an answer for you? It would help everyone. – cs95 Dec 29 '17 at 14:56
  • 1
    Is your index in the DataFrame actually accurate? The data you show there seem to have years of `17` and `217`... (not to mention it doesn't appear to actually be sampled (duplicate minutes occurring)) - if that's the source data (in Excel maybe) - you really should post your actual DataFrame and a reproducible sample in a code block and not as an image... – Jon Clements Dec 29 '17 at 14:56
  • @cᴏʟᴅsᴘᴇᴇᴅ - hi guys, my honest and serious apologizes, it's my intention to learn how to clearly upload my data. To answer your question, I do have a perfectly sampled data with frequency at minute level and 9:30 is indeed the first value (row) for each day. May I ask the logic of .transform? – Andrea Dec 29 '17 at 16:16
  • How it works is, it computes the output for each group, and then replicates each value so that the result is the same length as the original dataframe. – cs95 Dec 29 '17 at 16:17

1 Answers1

1

If needed, set your index to datetime -

df.index = pd.to_datetime(df.index, errors='coerce')

df

                     Price A  Price B
date                                 
2017-01-01 09:30:00      150        1
2017-01-01 09:31:00      153        2
2017-01-01 09:31:00      149        3
2017-01-01 09:31:00      151        4
2017-02-01 09:30:00      145        1
2017-02-01 09:31:00      139        2
2017-02-01 09:31:00      142        3
2017-02-01 09:31:00      149        4

An assumption here is that your day's recordings start at 9:30, making our job really easy.

Use groupby with a pd.Grouper + transform + first -

df['Open A'] = df.groupby(pd.Grouper(freq='1D'))['Price A'].transform('first')    
df

                     Price A  Price B  Open A
date                                         
2017-01-01 09:30:00      150        1     150
2017-01-01 09:31:00      153        2     150
2017-01-01 09:31:00      149        3     150
2017-01-01 09:31:00      151        4     150
2017-02-01 09:30:00      145        1     145
2017-02-01 09:31:00      139        2     145
2017-02-01 09:31:00      142        3     145
2017-02-01 09:31:00      149        4     145
cs95
  • 379,657
  • 97
  • 704
  • 746
  • thank you all so much for all the help so far. Please, could anybody point me to where I can learn how to upload the code and current pandas dataframe I have here? I feel bad for having uploaded a not useful picture. thanks again, Andrea – Andrea Dec 29 '17 at 16:04
  • @Coldspeed thank you so much, your answer is perfect. only thing, " paste the output into your question.". I tried but the output was a mess, not formatted. I thought there was a different way to do it. – Andrea Dec 29 '17 at 16:26
  • @Andrea Once pasted, highlight the data and hit ctrl+K. It's all mentioned in the Tour. – cs95 Dec 29 '17 at 16:26