2

I have data like this:

2010-08-27 00:00:00 SW
2010-08-27 00:15:00 SSW
2010-08-27 00:30:00 SSW
2010-08-27 00:45:00 SSE
2010-08-27 01:00:00 NNE

and so on.

So here is my question, How can I make a function in python that makes a mean of all that data, normal meaning in pandas does not work since this column is a string. Maybe numpy has some vector option to calculate this. Hope someone can help me. Thanks a lot!

AKX
  • 152,115
  • 15
  • 115
  • 172
Martín
  • 23
  • 4
  • 1
    How do you expect the mean to be computed? As an example, what is the mean of "N" and "S"? What is the mean of your sample data? – not_speshal Oct 26 '21 at 16:30
  • 3
    You'd probably map SW/SSW/... to degrees, then compute the mean, then optionally map back to a direction. – AKX Oct 26 '21 at 16:32
  • You'd probably also need a wind speed to calculate the average, no? Because wind from the south and wind from the north don't always happen at the same speed, so the average wind direction will need to be weighted by the speeds. – Pranav Hosangadi Oct 26 '21 at 16:33
  • @not_speshal points out a good question that didn't even occur to me on first blush. I don't think "average wind direction" has any meaning. If you have four readings of N, S, E, W, what is the average? Perhaps the mode would be a more useful measure. Count the entries and find the most common – Tim Roberts Oct 26 '21 at 16:45
  • This library willl give you an easy way to switch between names and degrees: https://pypi.org/project/compassheadinglib/ – match Oct 26 '21 at 17:26
  • Firstly I thought of what AKX said, I do have wind speed, though as I am doing a mean by day (95 readings per day) I reckon I am going to count the number of times each direcction appears and return the most frequent one as Tim said. Thank you for answering that quickly! – Martín Oct 26 '21 at 19:21

1 Answers1

1

This is a non-trivial problem because you are actually trying to compute means on a circular domain rather than an interval. There is whole field of Directional Statistics devoted to problems like this.

To be able to calculate a mean you need to choose a range for your angles, say [0°,360°]. If your data consists of 30° and 330°, and you take the mean of these numbers which gives (30°+330°)/2 = 180°, but intuitively the average of these two should be 0°. You can get around this by choosing your range carefully. Say take angles in the range [-180°, 180°] then our two data points become 30° and -30° giving a sensible mean of 0°.

The simplest thing is probably to calculate a mode of your data. Which direction occurs the most? This does not depend on the range you take. Indeed you would not need to calculate an angle at all. Just find which string "NNE", "SW" etc occurs the most.

If you want something a bit more sophisticated, then first take the mode, giving you a starting direction. Lets say "SSW" = 210° is the most common. Then choose your range to be 180° either side of this [30°,390°]. Take all angles in this range and calculate the mean. This still leaves the problem of what you do with a measurement in precisely the opposite direction. Do we class NNE as 30° or 390°? The simplest is to just reject this value.

A completely different approach is if you have wind speed data as well. So your data is like SSW, 3 mph. This becomes easier. Just map each reading to a point on the plane x = 3 cos(210°), y = 3 sin(210°), and take the average of these points. The result will be another point in the plane which you can convert back to speed and baring.

A prior SO question on the topic: How do you calculate the average of a set of circular data?

Salix alba
  • 7,536
  • 2
  • 32
  • 38