How to get statistics of once column of dataframe using data from a second column?

Question

I'm trying to write a program to give a deeper analysis of stock trading data but am coming up against a wall. I'm pulling all trades for a given timeframe and creating a new CSV file in order to use that file as the input for a predictive neural network.

The dataframe I currently have has three values: (1) the price of the stock; (2) the number of shares sold at that price; and (3) the unix timestamp of that particular trade. I'm having trouble getting any accurate statistical analysis of the data. For example, if I use .median(), the program only looks at the number of values listed rather than the fact that each value may have been traded hundreds of times based on the volume column.

As an example, this is the partial trading history for one of the stocks that I'm trying to analyze.

 0   227.60   40  1570699811183

 1   227.40   27  1570699821641

 2   227.59   50  1570699919891

 3   227.60   10  1570699919891

 4   227.36  100  1570699967691

 5   227.35  150  1570699967691 . . .

To better understand the issue, I've also grouped it by price and summed the other columns with groupby('p').sum(). I realize this means the timestamp is useless, but it makes visualization easier.

 227.22     2   1570700275307

 227.23   100   1570699972526

 227.25   100   4712101657427

 227.30   105   4712101371199

 227.33    50   1570700574172

 227.35  4008  40838209836171 . . .

Is there any way to use the number from the trade volume column to perform a statistical analysis of the price column? I've considered creating a new dataframe where each price is listed the number of times that it is traded, but am not sure how to do this.

Thanks in advance for any help!

have you tried making your own custom function that deals with it rather than relying on pandas functions? Or tried something like this? https://stackoverflow.com/questions/46086663/how-to-get-mean-and-standard-deviation-from-a-frequency-distribution-table-in-py?rq=1 — Hiten, Oct 30 '19 at 03:54
If you want to maintain the same shape as the original dataframe when using aggregations you can use the transform method. https://pbpython.com/pandas_transform.html — Michael Gardner, Oct 30 '19 at 04:02
Hiten - I think that other post probably has the answer I'm looking for. I'll play around with it tonight. As for the shape, it really doesn't matter because everything will be written to a CSV file and the memory overwritten on the next loop. — Robert Petrowsky, Oct 30 '19 at 22:41

How to get statistics of once column of dataframe using data from a second column?

0 Answers0