I have a large dataframe. It contains the columns 'Date', 'Time', 'Ticker', 'Open', 'High', 'Low', 'Close'.
Edit.. I added 2 days worth of sample
Date Time Ticker Open High Low Close
0 01/02/18 2:15 PM USD/JPY 112.315 112.325 112.250 112.270
1 01/02/18 2:45 PM USD/JPY 112.270 112.290 112.260 112.275
2 01/02/18 3:15 PM USD/JPY 112.265 112.280 112.245 112.255
3 01/02/18 3:45 PM USD/JPY 112.265 112.295 112.265 112.275
4 01/02/18 4:15 PM USD/JPY 112.265 112.360 112.265 112.360
5 01/02/18 4:45 PM USD/JPY 112.310 112.310 112.280 112.290
6 01/02/18 5:15 PM USD/JPY 112.270 112.280 112.230 112.230
7 01/02/18 5:45 PM USD/JPY 112.225 112.230 112.180 112.180
8 01/02/18 6:15 PM USD/JPY 112.210 112.280 112.210 112.225
9 01/02/18 6:45 PM USD/JPY 112.245 112.360 112.245 112.330
10 01/02/18 7:15 PM USD/JPY 112.330 112.355 112.290 112.290
11 01/02/18 7:45 PM USD/JPY 112.290 112.345 112.270 112.340
12 01/02/18 8:15 PM USD/JPY 112.340 112.340 112.310 112.340
13 01/03/18 2:15 PM USD/JPY 112.510 112.540 112.460 112.480
14 01/03/18 2:45 PM USD/JPY 112.480 112.480 112.420 112.440
15 01/03/18 3:15 PM USD/JPY 112.440 112.490 112.440 112.480
16 01/03/18 3:45 PM USD/JPY 112.485 112.525 112.485 112.515
17 01/03/18 4:15 PM USD/JPY 112.515 112.520 112.470 112.490
18 01/03/18 4:45 PM USD/JPY 112.520 112.565 112.520 112.540
19 01/03/18 5:15 PM USD/JPY 112.540 112.560 112.540 112.560
20 01/03/18 5:45 PM USD/JPY 112.560 112.625 112.555 112.615
21 01/03/18 6:15 PM USD/JPY 112.610 112.700 112.600 112.675
22 01/03/18 6:45 PM USD/JPY 112.675 112.725 112.660 112.710
23 01/03/18 7:15 PM USD/JPY 112.710 112.730 112.670 112.700
24 01/03/18 7:45 PM USD/JPY 112.700 112.780 112.695 112.765
25 01/03/18 8:15 PM USD/JPY 112.765 112.765 112.700 112.710
What I'm doing first is to find the highest (.idmax) value in the 'High' column for each date.
Edit, 2 days worth of output
#High grouped by Date
data = data.loc[data.groupby('Date')['High'].idxmax()]
Date Time Ticker Open High Low Close
4 01/02/18 4:15 PM USD/JPY 112.265 112.360 112.265 112.360
24 01/03/18 7:45 PM USD/JPY 112.700 112.780 112.695 112.765
The next step is where I get tripped up. I'd like to add one column, 'Open', for if 'Time' == 2:15, to the end of my dataframe. I have some code that finds this value when I create a new dataframe.
#New dataframe just for 2:15 pricing
df2 = data.loc[data['Time']=='2:15 PM']
Date Time Ticker Open High Low Close
0 01/02/18 2:15 PM USD/JPY 112.315 112.325 112.250 112.270
13 01/03/18 2:15 PM USD/JPY 112.510 112.540 112.460 112.480
I've tried to use a .merge function.
#trying to merge data and df2
frames = [data, df2]
result = pd.merge(data, df2, on='Date', how='outer')
result.dropna(how='all')
It adds to my dataframe but as a separate row.
I have also tried a concat function without any success.
result = pd.concat((frames), axis = 1)
Ideally I'd have the following columns:
'Date' 'Time' 'Ticker' 'Open' 'High' 'Low' 'Close---- which I can get via the data dataframe.
But to start I'd just like to add to the end the 'Open' value from the dataframe when the 'Time' == '2:15'
Edit..
This would be the desired output print(result)
Date Time Ticker Open High Low Close 2:15 price
4 01/02/18 4:15 PM USD/JPY 112.265 112.360 112.265 112.360 112.315
24 01/03/18 7:45 PM USD/JPY 112.700 112.780 112.695 112.765 112.510