2

I'm trying to use Python (3.5.1) to sort my data in bins and found some old threads that are related to this. However, so far I've only been able to see how to sort into a pre-defined number of bins and retrive the number of datapoints in each bin (e.g. thread 1, thread 2, thread 3 thread 4) which is not quite what I had in mind.

So my situation is this: I have a Panda dataframe with two columns of data. It looks something like this

4646.06    1.69
4886.33    1.17
4989.14    1.93
4992.14    1.00
5057.03    1.36
6417.99    1.15
6418.01    1.26
6418.02    1.04
6418.03    1.34
6419.01    1.20
6419.02    1.09
6422.24    2.01
......    .....

There are some 200 entries like these in the two columns. As you can see the data is separated by a variable interval and occasionally there are multiple numbers bundled together.

What I want: is to bin every value in the colums such that values between .8 and .4 is binned together and taken as an average. For example in the above a series like 17.99 and the three 18.something values belong to the same measurement and I need the mean of them and the corresponding entries in the second column which are to replace the original entries. So far I've done this by exporting to excel, manually find the mean and then reload it into a dataframe which cuts the number of entries in half. As long as the number of entries is small this is possible but if I at one point include more it will take to much time by hand so I would really like to do it automatically somehow.

This is where I'm stuck. I can't just define a set of bins from the beginning to the end since my data is not uniformly separated. Neither can I enter all the bins I want because this too becomes impractical at longer sets of data or make them automatically as in this example because the bins are not regularly spaced. So just to be clear a dataframe like the above would instead become

4646.06    1.69
4886.33    1.17
4989.14    1.93
4992.14    1.00
5057.03    1.36
(6417.99 + 6418.01 + 6418.02 + 6418.03)/4    (1.15 + 1.26 + 1.04 + 1.34)/4
(6419.01 + 6419.02)/2    (1.20 + 1.09)/2
6422.24    2.01
......    .....

I am really at a loss on how to do this - if it is even possible. Any advice would be greatly appreciated.

Community
  • 1
  • 1
AspicioAstra
  • 143
  • 5
  • what do you mean with the `values between 0.8 and 0.4` – Laurens Koppenol Dec 06 '16 at 14:58
  • and how do you know which values belong to the same measurement? – Laurens Koppenol Dec 06 '16 at 15:01
  • Sorry, I know it sounds strange. The first column represents a unit of time and every interval from say 1.8 to 2.4 can contain any number of measurements but I only need an average (most often only one measurements is available but occasionally there are more). Then from 2.4 to 2.8 it is impossible to take any measurements and then the pattern repeats. So I know points like 17.99 and 18.03 belongs to the interval 17.80 - 18.40 and then there is a break before the next at 19.01 (which belongs to the 18.80 - 19.40 interval). – AspicioAstra Dec 06 '16 at 15:38
  • And these time intervals are fixed? So you could add a column which states to which interval a certain measurement belongs right? – Laurens Koppenol Dec 07 '16 at 07:43

0 Answers0