5

I'm looking for the best C or C++ code to encode and decode decimal latitude and longitude values from/to double/char. I'd prefer the code convert from double to char[] and vice-versa rather than c++ strings.

If you have a code snippet that would be great too.

To clarify: I need to convert from a string Degrees/Minutes/Seconds to double and back to string. I have 300 million records, so speed is a big concern.

See: http://en.wikipedia.org/wiki/Geographic_coordinate_conversion

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
amanda
  • 601
  • 2
  • 8
  • 14
  • 1
    So are you writing C or C++? Why does it need to be "the fastest"? Why not the cleanest or most elegant, or most straightforward, or most flexible? (Which aren't mutually exclusive at all.) Why avoid `std::string`? Why even mention C when you're probably using C++? Why are you focused on speed instead of correctness? – GManNickG Oct 25 '10 at 20:57
  • Wow... maybe his primary concern is speed? Have you ever heard of information hiding / encapsulation? It doesn't matter how the function is implemented, as long as it works. – Sam Dufel Oct 25 '10 at 21:00
  • And what's the problem at all? Converting between C-string and double given the value format for latitude and longitude, or the handling of different formats, or the calculation of latitude and longitude values regarding other value spaces, or...? – Flinsch Oct 25 '10 at 21:03
  • @Sam: That's not suppose to be a primary concern. Maybe after the code is clean and working, the program can be *profiled* and optimized. Pretty clear the former hasn't happened yet. @amanda: To convert between formats lexically, see [this answer](http://stackoverflow.com/questions/1243428/convert-string-to-int-with-bool-fail-in-c/1243435#1243435). – GManNickG Oct 25 '10 at 21:03
  • @Sam: I don't think that referring to "amanda" as a "him" reflects well on your attention to details. – abelenky Oct 25 '10 at 21:25
  • The fastest: because I have about 300 million to process. I need to convert from Degrees/Minutes/Seconds in string format to double. – amanda Oct 25 '10 at 21:29
  • @amanda -- Show a real example of what the string would look like. – Benjamin Lindley Oct 25 '10 at 21:33
  • Please see http://en.wikipedia.org/wiki/Geographic_coordinate_conversion for samples of strings – amanda Oct 25 '10 at 21:37
  • @GMan: The reason I was trying to avoid std::string is because it must end up as a zero terminated char at some point. – amanda Oct 25 '10 at 21:41
  • @amanda: That's what `std::string::c_str` is for... Please get it working *first* in a clean manner, *then* figure out what parts can be sped up with a profiler. – GManNickG Oct 25 '10 at 21:42
  • @GMan: we have working code, we are looking for faster code and don't really want to re-invent the wheel. – amanda Oct 25 '10 at 21:45
  • @amanda: What code do you have now? – GManNickG Oct 25 '10 at 21:46
  • @abelenky: unfortunately there is not a set standard, if you look the link provided we have all those possible formats in the records. – amanda Oct 25 '10 at 21:47
  • Here is a copy of the possible formats: * 40:26:46N,79:56:55W * 40:26:46.302N 79:56:55.903W * 40°26'47"N 79°58'36"W * 40d 26' 47" N 79d 58' 36" W * 40.446195N 79.948862W * 40.446195, -79.948862 * 40° 26.7717, -79° 56.93172 – amanda Oct 25 '10 at 21:48
  • @GMan: We're using GeographicLib::GeoCoords::DSM now. See: http://geographiclib.sourceforge.net/ – amanda Oct 25 '10 at 21:51

5 Answers5

6

Working with the OP(amanda) via email, we've developed a fast function based on a large switch-case statement.

amanda reports that is runs somewhere around 15x faster than the code they had been using.
Considering this is run over 300 million records, that should be a pretty substantial time savings.

I found the problem to be very interesting.

Here is the code:

/* WARNING:  These values are very important, as used under the "default" case. */
#define INT_PART 3
#define DEC_PART 2

double Str2LatLong(char* coord)
//double LLStr::Str2LL(char* coord)
{
    int sign = +1;
    double val;

    int i = 0;  /* an index into coord, the text-input string, indicating the character currently being parsed */

    int p[9] = {0,0,1,  /* degrees */
                0,0,1,  /* minutes */
                0,0,1   /* seconds */
               };
    int* ptr = p;   /* p starts at Degrees. 
                       It will advance to the Decimal part when a decimal-point is encountered,
                       and advance to minutes & seconds when a separator is encountered */
    int  flag = INT_PART; /* Flips back and forth from INT_PART and DEC_PART */

    while(1)
    {
        switch (coord[i])
        {
            /* Any digit contributes to either degrees,minutes, or seconds */
            case '0':
            case '1':
            case '2':
            case '3':
            case '4':
            case '5':
            case '6':
            case '7':
            case '8':
            case '9':
                *ptr = 10* (*ptr) + (coord[i] - '0');
                if (flag == DEC_PART)  /* it'd be nice if I could find a clever way to avoid this test */
                {
                    ptr[1] *= 10;
                }
                break;

            case '.':     /* A decimal point implies ptr is on an integer-part; advance to decimal part */
                flag = DEC_PART; /* after encountering a decimal point, we are now processing the Decimal Part */
                ptr++;  /* ptr[0] is now the Decimal piece; ptr[1] is the Denominator piece (powers of 10) */
                break;

            /* A Null terminator triggers return (no break necessary) */
            case '\0':
                val = p[0]*3600 + p[3]*60 + p[6];             /* All Integer math */
                if (p[1]) val += ((double)p[1]/p[2]) * 3600;  /* Floating-point operations only if needed */
                if (p[4]) val += ((double)p[4]/p[5]) *   60;  /* (ditto) */
                if (p[7]) val += ((double)p[7]/p[8]);         /* (ditto) */
                return sign * val / 3600.0;                 /* Only one floating-point division! */

            case 'W':
            case 'S':
                sign = -1;
                break;

            /* Any other symbol is a separator, and moves ptr from degrees to minutes, or minutes to seconds */
            default:
                /* Note, by setting DEC_PART=2 and INT_PART=3, I avoid an if-test. (testing and branching is slow) */
                ptr += flag;
                flag = INT_PART; /* reset to Integer part, we're now starting a new "piece" (degrees, min, or sec). */
        }
        i++;
    }

    return -1.0;  /* Should never reach here! */
}
abelenky
  • 63,815
  • 23
  • 109
  • 159
  • This was an awesome speed improvement over anything we had tried. As abelenky said about 15 times faster on a full run of the data on a development server. Kudos to abelenky! – amanda Nov 02 '10 at 19:25
2

Here's the code I developed:

double Str2LatLong(char* coord)
{
    // char* testInput = "47,26'14\"";

    int i = 0;
    int parts[3] = {0};  // parts[0] is degrees, parts[1] is minutes, parts[2] is seconds
    int* pCurr = parts;

    do
    {
        if (coord[i] == '\0')
        {
            break;
        }
        if (!isdigit(coord[i]))
        {
            *pCurr++; // skip from parts[0] ==> [1], or [1] ==> [2]
            continue;
        }
        *pCurr = 10* (*pCurr) + coord[i] - '0';
        ++i;
    } while (1);

    return parts[0] + ((double)parts[1])/60.0 + ((double)parts[2])/3600.0;
}

Because it is written for speed, there is NO input validation.
You must supply proper input, or it will mess up badly.

I kept everything to integer math, and sequential memory as best I could.
It doesn't check for "proper" delimiters. Rather, anytime something is not a digit, it assumes that is the transition from degrees to minutes, or minutes to seconds.

It is only at the very last line that it converts to double with some very simple floating point operations.

I suspect you'll want to modify it to handle positive/negative values and North/South, East/West indicators, and decimal places after the seconds. But I think this code is a good foundation for a really fast conversion routine.

I hope this will test out very fast. Please let me know how it goes.

abelenky
  • 63,815
  • 23
  • 109
  • 159
1

The hard part is going to be representing all the format variations with a single grammar. Once you do, you can use a lexer generator tool to spit out a highly optimized DFA that will be competitive with the best hand-tuned code.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • Whilst your answer doesn't actually answer the question--and so it probably should have been a comment--I really appreciated the high-level thinking. – Jeremy W. Murphy Aug 12 '21 at 06:49
0

C++ strings and streams are very much a good idea, but if you absolutely can't use them, the following code might be useful. The first function writes the two doubles into an existing C string. The second function reads two doubles out of an existing C string and returns them by pointer:

void CoordinatesToString(double lat, double long, char *buffer, size_t len) {
    assert(buffer != NULL);
    assert(len > 0);

    snprintf(buffer, len, "%f, %f", lat, long);
    buffer[len - 1] = 0; /* ensure null terminated */
}

int StringToCoordinates(const char *string, double *outLatitude, double *outLongitude) {
    assert(string != NULL);
    assert(outLatitude != NULL);
    assert(outLongitude != NULL);

    return (2 == sscanf(string, "%f, %f", outLatitude, outLongitude));
}

Usage:

char buffer[128];
CoordinatesToString(90.0, -33.0, buffer, 128);

double lat, lng;
if (StringToCoordinates(buffer, &lat, &lng)) {
    printf("lat: %f long %f\n", lat, lng);
}

BUT. C++ strings are designed for this sort of use. They don't suffer from potential overflow problems inherent to char arrays, and you don't have to manually manage their memory--they resize to fit their contents as needed. Is there a reason why you want to avoid std::string when you say you're otherwise okay with a C++ solution?

Jonathan Grynspan
  • 43,286
  • 8
  • 74
  • 104
  • I don't think I made my question clear. Longitude and Latitude can be stored in a Degrees/Minutes/Seconds string format. I need to convert from that to decimal degrees stored in a double. Very sorry. – amanda Oct 25 '10 at 21:28
  • as the poster said her primary concern is speed, snprintf and sscanf are HORRIBLE choices. While they're incredibly flexible, they are amazingly slow, as they handle all manner of input and output. – abelenky Oct 25 '10 at 22:05
  • The OP did *not* mention speed when I originally posted this response. As originally posted, @amanda asked only for a solution that did not use `std::string`. Raw speed was not part of the question. – Jonathan Grynspan Oct 25 '10 at 22:19
  • 1
    The title "What's the fastest..." hasn't been edited, unless it was a ninja edit. – Ben Voigt Oct 26 '10 at 05:12
  • Yes, but just about every question ever posted on Stack Overflow wants the fastest possible answer without regard for readability or portability, because everyone wants premature optimization. In the absence of the question body explaining why speed is important, and in the presence of a question that is otherwise fairly basic to answer, would you assume the user is an optimizing guru and really knows for certain that this function will be called "300 million times", given that he or she has not stated it would be at the time you answer? – Jonathan Grynspan Oct 26 '10 at 08:23
0

Here's another variation.

The logic is the same, but it uses a switch-case statement for better organization, fewer break/continue statements, and a faster return.

You should also be able to enhance this one with

case 'N':
case 'S':
case 'E':
case 'W': 

as needed.

double Str2LatLong(char* coord)
{
    // char* testInput = "47,26'14\"";

    int i = 0;
    int parts[3] = {0};
    int* pCurr = parts;

    while(1)
    {
        switch (coord[i])
        {
            /* Any digit contributes to either degrees,minutes, or seconds (as *pCurr) */
            case '0':
            case '1':
            case '2':
            case '3':
            case '4':
            case '5':
            case '6':
            case '7':
            case '8':
            case '9':
                *pCurr = 10* (*pCurr) + coord[i] - '0';
                break;

            /* A Null terminator triggers return (no break necessary) */
            case '\0':
                return parts[0] + ((double)parts[1])/60.0 + ((double)parts[2])/3600.0;

            /* Any other symbol moves pCurr from degrees to minutes, or minutes to seconds */
            default:
                *pCurr++;
        }
        i++;
    }

    return -1.0;  /* Should never reach this point! */
}
abelenky
  • 63,815
  • 23
  • 109
  • 159
  • This one is very promising! On a development server it seems to be about 17.6 times faster than anything we've tried. However is does not handle decimal strings such as: 50-50-50.2433N so I'm unsure if it's faster or just fast because it's not working for all cases. – amanda Oct 26 '10 at 17:51
  • I think making it handle decimals would be pretty trivial. Making it handle *all* cases may be a bit challenging, but not insurmountable. – abelenky Oct 26 '10 at 18:31
  • Out of 300mil, those are the only ones that came out incorrect. Since the code is using integers, I'm not sure how you could graft on the decimals without changing to floats or am I missing something? Not looking for code, just direction! – amanda Oct 26 '10 at 18:34