in C, split name=value pairs into arrays

Question

I am programming on an Arduino in which programs are written in C. I am receiving HTTP GET response into a string, and want to separate the name/value pairs returned in the body of the request into a C multi-dimensional array so I can iterate over it and update stuff.

Here is the sample data I receive that I want to work on:

HTTP/1.1 200 OK
Date: Sun, 10 Oct 2010 23:26:07 GMT
Server: Apache/2.2.8 (Ubuntu) mod_ssl/2.2.8 OpenSSL/0.9.8g
Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT
Accept-Ranges: bytes
Content-Length: 123
Connection: close
Content-Type: text/plain

var1=red&var2=green&var3=up&var5=down&time=123443291&key=xmskwirrrr3

I've done several tests with strtok with no success... I'm a beginner C programmer.

This is what I want to eventually arrive at:

config[0][0] = var1
config[0][1] = red

config[1][0] = var2
config[1][1] = green

config[2][0] = var3
config[2][1] = up
...

I don't even know if I'm going about this the right way, but the name/value pairs in the HTTP response need to update some vars on this remote hardware... To update its configuration by updating vars with the newly received ones. If the names+values could get into arrays, or failing that even just set the local variable NAME to the value of VALUE... that would work.

This is C script that I tried so far: http://tpcg.io/qqvuLW

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="HTTP/1.1 200 OK\n\
Date: Sun, 10 Oct 2010 23:26:07 GMT\n\
Server: Apache/2.2.8 (Ubuntu) mod_ssl/2.2.8 OpenSSL/0.9.8g\n\
Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT\n\
ETag: \"45b6-834-49130cc1182c0\"\n\
Accept-Ranges: bytes\n\
Content-Length: 12\n\
Connection: close\n\
Content-Type: text/html\n\
\n\
#var1=red&var2=green&var3=up&var5=down&time=123443291&key=xmskwirrrr3";
  char * pch;
  //printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str,"#"); //i had the idea of splitting the headers off by using a # at the beginning of the name/value response
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, "&");
    //if the names+values could get into arrays, or failing that even just set the local variable NAME to the value of VALUE... that would work.
  }
  return 0;
}

Note that you should not use backslash-newline to continue strings. You should use string concatenation: `char data[] = "line1\n" "line2\n" …;` where the strings separated by white space (not commas, and you'd use a newline and leading blanks or tabs to separate the strings; that can't be formatted into a comment) will be combined into one bigger string. It allows you to indent the second and subsequent lines of data, which improves the readability of the code. — Jonathan Leffler, Sep 16 '19 at 02:53
Please include your code in the question (as text, either enclosed in two sets of triple back-quotes or indented by four spaces using the **`{}`** button above the edit box). We need to be able to rely on seeing your code in 10 years time, and it's debatable whether your off-site link will still work then. I've done it for you this time. Do it yourself when you ask the question in future. — Jonathan Leffler, Sep 16 '19 at 02:56
How can you write code to allocate space for the names and the values? Do you need a structure, or do you use two arrays? (That depends in part on what you've learned so far — if you know how to use structures, do so.) You can consider using `strcspn()` and maybe `strspn()` (or maybe `strpbrk()`) to process the `name=value` found by `strtok()`. Note that `strtok()` is destructive (it writes nulls into your source string) and doesn't let you know which delimiter it found, and it can't be used nested in loops. You could consider POSIX `strtok_r()` or Windows `strtok_s()` if you want that. — Jonathan Leffler, Sep 16 '19 at 03:01
@Nina — good question, except that they'd be 2D arrays of pointers or 3D arrays of characters because each of `config[0][0]` and `config[0][1]` needs to able to hold a string (so `config[0][0][0] == 'v'`, and `config[0][0][1] == 'a'`, for example). — Jonathan Leffler, Sep 16 '19 at 03:07
you can strtok with more delimiters at once `strtok(str, "&=")` — Juraj, Sep 16 '19 at 05:58
Are you interested in the contents of the HTTP header fields? Please post full details — user3629249, Sep 16 '19 at 21:40
Why are you writing an example of the HTTP message (BTW: an incorrect example) into your code? You don't need ANY of that text — user3629249, Sep 16 '19 at 21:43
A couple key details about a HTTP message: 1) the lines in a HTTP header are separated via `\r\n` NOT `\n` 2) the separation between the header and the body is the string: `\r\n\r\n` 3) The body of the message ends with the string: "\r\n\r\n" — user3629249, Sep 16 '19 at 21:44
One more critical detail. The body does not start with a `#` — user3629249, Sep 16 '19 at 22:05
If not interested in the HTTP header lines, then the first parsing statement would be: index = strstr( HTTP-message, "\r\n\r\n" ); Which will yield the offset into the HTTP message where the header fields end. Add 4 bytes to that offset to see the beginning of the body' Thereafter, in a loop, strtok() for `=` then strtok() for `&` until strtok() returns NULL. — user3629249, Sep 16 '19 at 22:08
Please read [HTTP Message Format](https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages) — user3629249, Sep 16 '19 at 22:15
Please read [HTTP details](https://httpwg.org/specs/rfc7231.html) — user3629249, Sep 16 '19 at 22:37
@user3629249 ... thank you for the guidance. The HTTP headers are included in my question because those headers and body are received by the device in one raw string of data. I was showing the data I am working on. Since I control the server I can start the body with a "#" however parsing with strstr for the first \r\n\r\n does make more sense though. — Vince K, Oct 06 '19 at 01:11
Thank you @JerryJeremiah your linked code gets my proposed name value pair string data into the C variables where I can access them. If your comment was listed an answer here, I'd accept it. — Vince K, Oct 06 '19 at 01:25
@VinceK I'll do that now. You never want to post an answer unless you are sure it does what the original question was asking for. So I always post answers as comments first just to be sure they what what is needed. — Jerry Jeremiah, Oct 07 '19 at 01:41

Jerry Jeremiah · Accepted Answer · 2019-10-08T01:01:20.007

Possible approaches

There are basically two different approaches to storing the data:

You can store a pointer into the original strtok'ed string
You can strcpy the string from the original strtok'ed string into the array

It depends on the lifetime of the original strtok'ed string - if the data in the array needs to outlive the string then you can't store pointers. But if pointers can be stored then there is a memory savings by not having to allocate memory for the strings twice.

So in my example of how to do it, I will store the pointers. If that's now what you are after you can allocate the array with that additional dimension and use strcpy to save the actual values.

How to understand HTTP messages

In an HTTP message all the lines in the message except the body must end with a carriage return and linefeed pair. See https://stackoverflow.com/a/27966412/2193968 and https://stackoverflow.com/a/5757349/2193968

If you want to hard code the message in your program you might think you can Just type in one long string with embedded end-of-line characters - like this:

char * string = "GET / HTTP/1.1\r\
Host: hostname\r\
\r\
body";

And it might even work - if you are lucky. But how do you know what character that embedded end-of-line character is? It depends on your editor (a Unicode editor will use something different from an ANSI editor) and it depends on your OS (Windows uses something different from Linus). It just isn't good to make that assumption. Besides that it messes up the syntax highlighting and confuses everybody that looks at your code. But the language has anticipated that and will automatically concatenate consecutive strings:

char * string = "GET / HTTP/1.1\r\n"
                "Host: hostname\r\n"
                "\r\n"
                "body";

So, to find the actual data in the body, the body doesn't need to contain special characters like # because we can just look for the first blank line in the message which is designed to separate the header from the body. The way to recognise that division is to find a carriage return and line feed pair that is immediately followed by another one. It needs to be the first such sequence in the message because the body might contain that character sequence as well.

Now, the body might contain URL encoded form data and this might always be true with the message in your question, but it isn't always true. You need to check the Content-Type header for the actual type of the body. I will assume that doesn't need to be checked in my example but it is something you should watch for. The other important tone is the Content-Length header - it contains the number of bytes of data in the body (not including the header and the blank line) For URL encoded form data like yours it probably doesn't matter but for any other Content-Type it probably does. The correct Content-Type for URL encoded form data is application/x-www-form-urlencoded See https://stackoverflow.com/a/14551320/2193968

How to parse URL encoded form HTTP bodies

So once we get a string pointer pointing to the body (by looking for the first blank line in the message) we can strtok the string. URL encoded form data has keys and values separated by = and KVPs separated by & and we could just call strtok with a delimiter string of "=&" except if the value contains an equal sign our determination of whether we are getting a key or value gets out of sync with the data. So it is better if we look for the specific character we expect to find and stay in sync - always knowing where we are.

So in my example I alternate the delimiter based on whether I am storing a key or a value.

The code

You can try it here: https://www.onlinegdb.com/Symtd53LB

Hopefully the comments i nthe code are helpful enough. If not comment and I will add some detail.

#include <stdio.h>
#include <string.h>

int main ()
{
  // in valid HTTP, each line of the header and the blank
  // line must end with a carraige return and linefeed
  char str[] =  "HTTP/1.1 200 OK\r\n"
                "Date: Sun, 10 Oct 2010 23:26:07 GMT\r\n"
                "Server: Apache/2.2.8 (Ubuntu) mod_ssl/2.2.8 OpenSSL/0.9.8g\r\n"
                "Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT\r\n"
                "ETag: \"45b6-834-49130cc1182c0\"\r\n"
                "Accept-Ranges: bytes\r\n"
                "Content-Length: 12\r\n"
                "Connection: close\r\n"
                "Content-Type: text/html\r\n"
                "\r\n"
                "var1=red&var2=green&var3=up&var5=down&time=123443291&key=xmskwirrrr3";

  // so, we want to find the blank line and then move to the next non-blank line
  // no error checking has been done...
  char *data = strstr(str,"\r\n\r\n")+strlen("\r\n\r\n");
  printf("\nBody = %s\n", data);

  // we will use an array of 1000 pairs of pointers 
  // so we can only handle 1000 KVP max...
  char *pointers[1000][2] = {}; // default initialise all of them to MULL

  // we need to keep track of the KVP number
  // KVP starts at 0 and might go to 999
  int KVP = 0;

  // and a flag that indicates whether it is the key or value we are parsing
  // key when key_value=0, value when key_value=1
  #define KVP_KEY 0
  #define KVP_VALUE 1
  int key_value = KVP_KEY; // the key should be first

  // first strtok the first value
  // we need to do this because the first time strtok needs different parameters
  pointers[KVP][key_value] = strtok(data,"=");

  // loop through splitting it up and saving the pointers
  // no error checking has been done...
  while(KVP < 1000 && pointers[KVP][key_value] != NULL)
  {
    // ok, we need to update KVP and key_value
    if (key_value == KVP_VALUE)
    {
      KVP++;
      key_value = KVP_KEY;
      pointers[KVP][key_value] = strtok(NULL,"=");
    }
    else // if (key_value == KVP_KEY)
    {
      key_value = KVP_VALUE;
      pointers[KVP][key_value] = strtok(NULL,"&");
    }
  }

  // now it should all be parsed
  // let's print it all out
  for(int i=0; i<KVP; i++)
  {
    printf("\nkey[%d] = %s\nvalue[%d] = %s\n",
        i, pointers[i][KVP_KEY],
        i, pointers[i][KVP_VALUE]);
  }

  return 0;
}

score -1 · Answer 2 · answered Sep 16 '19 at 04:27

after tokenizing with # you have to tokenize with = instead of &

here is the code:

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="HTTP/1.1 200 OK\n\
  Date: Sun, 10 Oct 2010 23:26:07 GMT\n\
  Server: Apache/2.2.8 (Ubuntu) mod_ssl/2.2.8 OpenSSL/0.9.8g\n\
  Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT\n\
  ETag: \"45b6-834-49130cc1182c0\"\n\
  Accept-Ranges: bytes\n\
  Content-Length: 12\n\
  Connection: close\n\
  Content-Type: text/html\n\
  \n\
  #var1=red&var2=green&var3=up&var5=down&time=123443291&key=xmskwirrrr3";
  char * pch;
  int pairs_count = 0;
  int max_pairs = 20;
  int max_var = 50;
  char config[max_pairs][2][max_var];
  //printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str,"#"); //i had the idea of splitting the headers off by using a # at the beginning of the name/value response
  while (pch != NULL)
  {
    pch = strtok(NULL, "=");
    if (pch == NULL) break;
    strcpy(config[pairs_count][0], pch);

    pch = strtok(NULL, "&");
    if (pch == NULL) break;
    strcpy(config[pairs_count][1], pch);

    pairs_count++;
    //if the names+values could get into arrays, or failing that even just set the local variable NAME to the value of VALUE... that would work.
  }
  for(int i=0;i<pairs_count;i++) {
      printf("%s = %s\n", config[i][0], config[i][1]);
  }
  return 0;
}

output:

var1 = red
var2 = green
var3 = up
var5 = down
time = 123443291
key = xmskwirrrr3

this algorithm will fail if there are no name+value pairs in the body of the HTTP message and the OP has a mis-informed idea about the format of a HTTP message. You should NOT encourage the OPs mis-information — user3629249, Sep 16 '19 at 22:14
@user3629249 there are no standards nor requirements for the format of a HTTP message. It can be whatever customized data or file is needed by the client. Perhaps the client device is requesting a set of name and value pairs. If the response includes a Content-Type header I can see how the message should confirm to the format of the Content-Type. In my question I had an example Content-Type of text/html which wasn't consistent with my example payload, but I will surely change that to Content-Type: VinceCustomNVP or even omit it since my hardware will likely not even inspect the headers. — Vince K, Oct 06 '19 at 01:21
There is, actually, a very strict definition of a HTTP web page. Fortunately, most browsers are VERY lenient about enforcing those rules — user3629249, Oct 06 '19 at 01:47

in C, split name=value pairs into arrays

2 Answers2

Possible approaches

How to understand HTTP messages

How to parse URL encoded form HTTP bodies

The code