1

I want to parse the actual payload from the output of AT commands.

For instance: in the example below, I'd want to read only "2021/11/16,11:12:14-32,0"

AT+QLTS=1                          // command
+QLTS: "2021/11/16,11:12:14-32,0"  // response

OK

In the following case, I'd need to only read 12345678.

AT+CIMI     // command
12345678   // example response

So the point is: not all commands have the same format for the output. We can assume the response is stored in a string array.

I have GetAtCmdRsp() already implemented which stores the response in a char array.

void GetPayload()
{
  char rsp[100] = {0};
  GetAtCmdRsp("AT+QLTS=1", rsp);
  // rsp now contains +QLTS: "2021/11/16,11:12:14-32,0"
  // now, I need to parse "2021/11/16,11:12:14-32,0" out of the response
  
  memset(rsp, 0, sizeof(rsp));

  GetAtCmdRsp("AT+CIMI", rsp);
  // rsp now contains 12345678   
  // no need to do additional parsing since the output already contains the value I need
}

I was thinking of doing char *start = strstr(rsp, ":") + 1; to get the start of the payload but some responses may only contain the payload as it's the case with AT+CIMI

Perhaps could regex be a good idea to determine the pattern +<COMMAND>: in a string?

Roberto Caboni
  • 7,252
  • 10
  • 25
  • 39
xyf
  • 664
  • 1
  • 6
  • 16
  • 1
    I would suggest abstracting `GetAtCmdRsp` further to be command specific. Or at least pass in an abstract enum/define instead of a fixed string and let the implementation generate the string and the correct response parsing specific to each command. – kaylum Nov 16 '21 at 22:17
  • If you don't want to add command awareness to your implementation then an alternative would be to use heuristics - e.g. if response contains `:` then parse accordingly else return whole string. But that may be less robust and end up having to handle many different cases anyway. – kaylum Nov 16 '21 at 22:20
  • Fiddling with the standard library to do, often complex, state machines is a headache, (with backtracking, I understand?) Use [re2c](https://re2c.org/manual/manual_c.html), '@' parser. – Neil Nov 16 '21 at 22:21
  • @kaylum I can't just parse based on `:` alone though as mentioned. The payload could also contain `:` – xyf Nov 16 '21 at 22:23
  • Yes that's why I said "end up having to handle many different cases anyway". The heurstic would need to take care of all the different cases. That was only provided as a second alternative - my first suggestion would be the preferred one IMHO. – kaylum Nov 16 '21 at 22:26
  • `GetAtCmdRsp(AT_CMD_QLTS, rsp); GetAtCmdRsp(AT_CMD_CIMI, rsp);` for example where the `AT_CMD` values are `enum`s or `#define`s – kaylum Nov 16 '21 at 22:50

1 Answers1

1

In order to parse AT command responses a good starting point is understanding all the possible formats they can have.
So, rather than implementing a command specific routine, I would discriminate commands by "type of response":

  1. Commands with no payload in their answers, for example

     AT
     OK
    
  2. Commands with no header in their answers, such as

     AT+CIMI
     12345678
    
     OK
    
  3. Commands with a single header in their answers

     AT+QLTS=1
     +QLTS: "2021/11/16,11:12:14-32,0"
    
     OK
    
  4. Command with multi-line responses.
    Every line could of "single header" type, like in +CGDCONT:

    AT+CDGCONT?
    +CGDCONT: 1,"IP","epc.tmobile.com","0.0.0.0",0,0
    +CGDCONT: 2,"IP","isp.cingular","0.0.0.0",0,0
    +CGDCONT: 3,"IP","","0.0.0.0",0,0
    
    OK
    

    Or we could even have mixed types, like in +CGML:

     AT+CMGL="ALL"
    
     +CMGL: 1,"REC READ","+XXXXXXXXXX","","21/11/25,10:20:00+00"
     Good morning! How are you?
    
     +CMGL: 2,"REC READ","+XXXXXXXXXX","","21/11/25,10:33:33+00"
     I'll come a little late. See you. Bruce Wayne
    
     OK
    

    (please note how it could have also "empty" lines, that is \r\n).

At the moment I cannot think about any other scenario.
In this way you'll be able to define an enum like

typedef enum
{
    AT_RESPONSE_TYPE_NO_RESPONSE,
    AT_RESPONSE_TYPE_NO_HEADER,
    AT_RESPONSE_TYPE_SINGLE_HEADER,
    AT_RESPONSE_TYPE_MULTILINE,
    AT_RESPONSE_TYPE_MAX
}

and pass it to your GetAtCmdRsp( ) function in order to parser the response accordingly. If implement the differentiation in that function, or after it (or in an external function is your choice.


A solution without explicit categorization

Once you have clear all the scenarios that might ever occur, you can think about a general algorithm working for all of them:

  1. Get the full response resp after the command echo and before the closing OK or ERROR. Make sure that the trailing \r\n\r\nOK is removed (or \r\nERROR. Or \r\nNO CARRIER. Or whatever the terminating message of the response might be).
    Make also sure to remove the command echo

  2. If strlen( resp ) == 0 we belong to the NO_RESPONSE category, and the job is done

  3. If the response contains \r\ns in it, we have a MULTILINE answer. So, tokenize it and place every line into an array element resp_arr[i]. Make sure to remove trailing \r\n

  4. For every line in the response (for every resp_arr[i] element), search for <CMD> : pattern (not only :, that might be contained in the payload as well!). Something like that:

     size_t len = strlen( resp_cur_line );
     char *payload;
    
     if( strstr( "+YOURCMD: ", resp_cur_line) == NULL )
     {
         // We are in "NO_HEADER" case
         payload = resp_cur_line;
     }
     else
     {
         // We are in "HEADER" case
         payload = resp_cur_line + strlen( "+YOURCMD: " );
     }
    

    Now payload pointer points to the actual payload.

    Please note how, in case of MULTILINE answer, after splitting the lines into array elements every loop will handle correctly also the mixed scenarios like the one in +CMGL, as you'll be able to distinguish the lines containing the header from those containing data (and from the empty lines, of course). For a deeper analysis about +CMGL response parsing have a look to this answer.

Roberto Caboni
  • 7,252
  • 10
  • 25
  • 39
  • 1
    Regarding `At the moment I cannot think about any other scenario`- there is one more: multiple responses that each span multiple lines, for instance `+CMGL: ,,,[],[][,,][`. As such the names `AT_RESPONSE_TYPE_HEADER` and `AT_RESPONSE_TYPE_MULTILINE` probably would be better as `AT_RESPONSE_TYPE_HEADER_SINGLE` and `AT_RESPONSE_TYPE_HEADER_MULTIPLE` with additional `AT_RESPONSE_TYPE_HEADER_MULTIPLE_MULTILINE` for the CMGL use case. – hlovdal Nov 27 '21 at 23:54
  • 1
    Although it is possible to use less granular division, see [this answer](https://stackoverflow.com/a/36873777/23118) for an example where I just use CMGL_NONE, CMGL_PREFIX and CMGL_DATA (for `AT+CMGL` specifically though). – hlovdal Nov 27 '21 at 23:55
  • @hlovdal thanks for your suggestion. I actually forgot `+CMGL` :). I renamed "HEADER" case to "SINGLE_HEADER", like you suggested, but I kept "MULTILINE" as a single scenario. In fact, with the purpose of removing the header, "multiline mixed case" is fortunately covered by my pseudo-code, as every line will be parsed accordingly. ;) The answer you linked is more specific for `+CMGL` and I linked it. – Roberto Caboni Nov 28 '21 at 12:40
  • @hlovdal BTW: all our wise suggestions will work only until some annoying guy will send SMSs starting with `+CMGL: `. :) – Roberto Caboni Nov 28 '21 at 12:42