0

I have a large dataframe. It contains the columns 'Date', 'Time', 'Ticker', 'Open', 'High', 'Low', 'Close'.

Edit.. I added 2 days worth of sample

      Date     Time   Ticker     Open     High      Low    Close
0     01/02/18  2:15 PM  USD/JPY  112.315  112.325  112.250  112.270
1     01/02/18  2:45 PM  USD/JPY  112.270  112.290  112.260  112.275
2     01/02/18  3:15 PM  USD/JPY  112.265  112.280  112.245  112.255
3     01/02/18  3:45 PM  USD/JPY  112.265  112.295  112.265  112.275
4     01/02/18  4:15 PM  USD/JPY  112.265  112.360  112.265  112.360
5     01/02/18  4:45 PM  USD/JPY  112.310  112.310  112.280  112.290
6     01/02/18  5:15 PM  USD/JPY  112.270  112.280  112.230  112.230
7     01/02/18  5:45 PM  USD/JPY  112.225  112.230  112.180  112.180
8     01/02/18  6:15 PM  USD/JPY  112.210  112.280  112.210  112.225
9     01/02/18  6:45 PM  USD/JPY  112.245  112.360  112.245  112.330
10    01/02/18  7:15 PM  USD/JPY  112.330  112.355  112.290  112.290
11    01/02/18  7:45 PM  USD/JPY  112.290  112.345  112.270  112.340
12    01/02/18  8:15 PM  USD/JPY  112.340  112.340  112.310  112.340
13    01/03/18  2:15 PM  USD/JPY  112.510  112.540  112.460  112.480
14    01/03/18  2:45 PM  USD/JPY  112.480  112.480  112.420  112.440
15    01/03/18  3:15 PM  USD/JPY  112.440  112.490  112.440  112.480
16    01/03/18  3:45 PM  USD/JPY  112.485  112.525  112.485  112.515
17    01/03/18  4:15 PM  USD/JPY  112.515  112.520  112.470  112.490
18    01/03/18  4:45 PM  USD/JPY  112.520  112.565  112.520  112.540
19    01/03/18  5:15 PM  USD/JPY  112.540  112.560  112.540  112.560
20    01/03/18  5:45 PM  USD/JPY  112.560  112.625  112.555  112.615
21    01/03/18  6:15 PM  USD/JPY  112.610  112.700  112.600  112.675
22    01/03/18  6:45 PM  USD/JPY  112.675  112.725  112.660  112.710
23    01/03/18  7:15 PM  USD/JPY  112.710  112.730  112.670  112.700
24    01/03/18  7:45 PM  USD/JPY  112.700  112.780  112.695  112.765

25 01/03/18 8:15 PM USD/JPY 112.765 112.765 112.700 112.710

What I'm doing first is to find the highest (.idmax) value in the 'High' column for each date.

Edit, 2 days worth of output

#High grouped by Date
data = data.loc[data.groupby('Date')['High'].idxmax()]

      Date     Time   Ticker     Open     High      Low    Close
4     01/02/18  4:15 PM  USD/JPY  112.265  112.360  112.265  112.360
24    01/03/18  7:45 PM  USD/JPY  112.700  112.780  112.695  112.765

The next step is where I get tripped up. I'd like to add one column, 'Open', for if 'Time' == 2:15, to the end of my dataframe. I have some code that finds this value when I create a new dataframe.

#New dataframe just for 2:15 pricing
df2 = data.loc[data['Time']=='2:15 PM']

          Date     Time   Ticker     Open     High      Low    Close
0     01/02/18  2:15 PM  USD/JPY  112.315  112.325  112.250  112.270
13    01/03/18  2:15 PM  USD/JPY  112.510  112.540  112.460  112.480

I've tried to use a .merge function.

#trying to merge data and df2
frames = [data, df2]
result = pd.merge(data, df2, on='Date', how='outer')
result.dropna(how='all')

It adds to my dataframe but as a separate row.

I have also tried a concat function without any success.

result = pd.concat((frames), axis = 1)

Ideally I'd have the following columns:

'Date' 'Time' 'Ticker' 'Open' 'High' 'Low' 'Close---- which I can get via the data dataframe.

But to start I'd just like to add to the end the 'Open' value from the dataframe when the 'Time' == '2:15'

Edit..

This would be the desired output print(result)

      Date     Time   Ticker     Open     High      Low    Close     2:15 price
4     01/02/18  4:15 PM  USD/JPY  112.265  112.360  112.265  112.360    112.315
24    01/03/18  7:45 PM  USD/JPY  112.700  112.780  112.695  112.765    112.510
  • You want to add a new column with the value the asset on the same day at 14:15 p ? – ifly6 Aug 15 '19 at 17:18
  • Yes that is correct. I want to learn how to use certain row values and basically pass that value into the dataframe – blankslatecoder Aug 15 '19 at 19:21
  • Opens, closes, highs, and lows, however, are a daily measure, are they not? – ifly6 Aug 15 '19 at 19:24
  • Not with this data. I pulled it for every 30mins, so it will spit out open, close, high, low for every 30min period. i would just like to add the open for one row, in this case 2:15 – blankslatecoder Aug 15 '19 at 19:36

2 Answers2

0

Could you describe a little more? For example, in the Open column, if time != 2:15pm, what do you want it be listed as?

In general, you could do something like:

df['Open'] = df['Time'].apply(lambda x: 'Yes' if x == '2:15 PM' else 'No')
Andrew Hamel
  • 336
  • 1
  • 13
  • Perhaps I am misunderstanding your question. – Andrew Hamel Aug 15 '19 at 17:22
  • Thanks for the response. I only want the open column for the 2:15 row, which each 'date' there will only be one 2:15 row if that makes sense. – blankslatecoder Aug 15 '19 at 17:35
  • I've updated my question with better actual examples, can you please take a look? I'm still learning python in general and have little experience with lambda. If i have 2 dataframes (data = the entire dataframe, df2= only the data where time is ==215) would a lambda be what I should use to essentially add a column to the new dataframe that has data.loc[data.groupby('Date')['High'].idxmas() ? I'd just like to learn how to take 2 dataframes with a similar 'key' and create a new one based on a certain value in the dataframe. – blankslatecoder Aug 19 '19 at 16:44
0

Select all the elements at 14.15 p and then merge them back in as a separate row.

price_df = data.loc[data['Time'] == '2:15 PM', ['Date', 'Open']].rename(columns={'Open': 'value_at1415'})
with_cols_df = data.merge(price_df, on='Date', how='left')

You could also use a map between date to a generated set where the index is set to the date. I believe this code below should do that, though I've not tested it.

df['value_at1415'] = data['Date'].map(data[data['Time'] == '2:15 PM'].set_index('Date')['Open'])
ifly6
  • 5,003
  • 2
  • 24
  • 47
  • I updated my question with better examples of the data I'm working with, can you please take a look? I've tried the merge function, but still not looking correct? result = pd.merge(data, df2, on='Date') – blankslatecoder Aug 19 '19 at 16:41
  • You need to rename the column and reattach it. Or map your index. – ifly6 Aug 19 '19 at 16:42
  • I'm sorry, could you elaborate further? I tried this mapping, and here it my output, its not populating the '1415' column with the data, just NaN eventhough it should (updated my example) Date Time Ticker ... Low Close value_at1415 4 01/02/18 4:15 PM USD/JPY ... 112.265 112.360 NaN 24 01/03/18 7:45 PM USD/JPY ... 112.695 112.765 NaN – blankslatecoder Aug 19 '19 at 17:23
  • What do you get when you try the merge? – ifly6 Aug 19 '19 at 17:25
  • result= pd.merge(data, df2, on='Date') print (result) Date Time Ticker Open High Low Close 4 01/02/18 4:15 PM USD/JPY 112.265 112.360 112.265 112.360 24 01/03/18 7:45 PM USD/JPY 112.700 112.780 112.695 112.765 Doesn't look like it brings in the df2 at all, which is just the 2:15 Open Price – blankslatecoder Aug 19 '19 at 17:57
  • Output isn't input, what do you _get_ when you try the code I have above? Do you create a `price_df`? What are the columns in that `price_df`? Are they renamed correctly? When you merge them, do you invoke the proper parameters? If we include a validation step, are merge keys unique in `price_df`? – ifly6 Aug 19 '19 at 17:59
  • Ok so this works, thank you. I need to go back and look at my code to figure out what the difference is, your's is much simpler which I appreciate. I defined a dataframe just for the 2:15 prices, but didn't rename the columns, also my merge looks different. Thank you for clarifying this. – blankslatecoder Aug 19 '19 at 19:26