lwip board cannot maintain connection to another lwip board

Question

I have a strange problem. For some time I've been trying to replace a small protocol converter (basically a two way serial to ethernet ... master and slave) that I've got for something that has more features.

Backstory

After a lot of reverse engineering I found out how the device works and I've been trying to replicate it and I've been successful in connecting my board to the device ... I've tried connecting the original as the master and my board as slave and vice versa and everything works perfectly, it's actually better since at higher speeds there are no more packet losses (connecting 2 original ones would cause packet losses).

However when I tried connecting my device as master and another one of my devices as slave .. running the exact same piece of code it works for 2 or 3 exchanges and then it stops ... eventually SOMETIMES after some minutes it will try again 2 or 3 more times.

How the tests were made

I connected a modbus master and slave (modbustools, two different instances). The master is a serial RTU modbus and the slave is an serial RTU modbus;
I configure one of my devices as master and connect it to the serial port so that it receives the serial modbus and sends the protocol to a device connected to it;
I configure my slave so that it connects via the serial port to the slave modbus. Basically it works by creating a socket and connecting to the master's IP, it then waits for a master transmission via ethernet, sends it via serial to the slave modbus (modbustools), receives a response, sends it its master and then it sends it to the modbus master (modbustools);

I's a bit confusing but that's how it works ... my master awaits a socket connection and then the communication between them starts, because that is how the old ones work.

I've written an echo client now to test the connection. Basically now, my code connects to a server (my master), it receives a packet, then it replies back the same packet that it received. When I try connecting this to my 2 boards they don't work. It's more of the same, 2 or 3 exchanges and then it stops, but when I connect it to the original device it keeps running without a hitch.

Sources

Here is my TCP master (server actually) initialization:

void initClient() {
            if(tcp_modbus == NULL) {
                tcp_modbus = tcp_new();
                previousPort = port;
                tcp_bind(tcp_modbus, IP_ADDR_ANY, port);
                tcp_sent(tcp_modbus, sent);
                tcp_poll(tcp_modbus, poll, 2);
                tcp_setprio(tcp_modbus, 128);
                tcp_err(tcp_modbus, error);
                tcp_modbus = tcp_listen(tcp_modbus);
                tcp_modbus->so_options |= SOF_KEEPALIVE; // enable keep-alive
                tcp_modbus->keep_intvl = 1000; // sends keep-alive every second
                tcp_accept(tcp_modbus, acceptmodbus);
                isListening = true;
            }
}
static err_t acceptmodbus(void *arg, struct tcp_pcb *pcb, err_t err) {
    tcp_arg(pcb, pcb);
    /* Set up the various callback functions */
    tcp_recv(pcb, modbusrcv);
    tcp_err(pcb, error);

    tcp_accepted(pcb);

    gb_ClientHasConnected = true;
}

//receives the packet, puts it in an array "ptransparentmessage->data"
//states which PCB to use in order to reply and the length that was received
static err_t modbusrcv(void *arg, struct tcp_pcb *pcb, struct pbuf *p, err_t err) {
    if(p == NULL) {
        return ERR_OK;
    } else if(err != ERR_OK) {
        return err;
    }

    tcp_recved(pcb, p->len);

    memcpy(ptransparent.data, p->payload,p->len);
    ptransparent->pcb = pcb;
    ptransparent->len = p->len;
}

The serial reception is basically this: detect one byte received, start timeout, when timeout ends send whatever was received via a TCP socket that was already connected to the server .. it then receives the packet via the acceptmodbus function and sends it via serial port.

This is my client's (slave) code:

void init_slave() {
    if(tcp_client == NULL) {
        tcp_client = tcp_new();

        tcp_bind(tcp_client, IP_ADDR_ANY, 0);
        tcp_arg(tcp_client, NULL);
        tcp_recv(tcp_client, modbusrcv);
        tcp_sent(tcp_client, sent);
        tcp_client->so_options |= SOF_KEEPALIVE; // enable keep-alive
        tcp_client->keep_intvl = 100; // sends keep-alive every 100 mili seconds
        tcp_err(tcp_client, error);


        err_t ret = tcp_connect(tcp_client, &addr, portCnt, connected);
    }
}

The rest of the code is the identical. The only thing that changes is the flow of operation.

Connect to server
Wait for packet
send it via serial
wait for response timeout (same timeout as the server, it justs starts counting in a different way ... server starts after receiving one byte and client after it sent something via the serial port)
get response and send it to the server

Observation:

No error is detected in the communication. After some testing it doesn't seem to be the number of exchanges that causes the hang. It happens after some time. In my opinion this sounds like a disconnection problem or timeout error, but no disconnection occurs and no more packets are received. When I stop debugging and check the sockets nothing out of the ordinary is detected.

Marcos G. · Answer 1 · 2019-06-26T07:30:27.140

1

If I understood your question the right way, you have a computer with two serial ports, each running a Modbus client and server instance. From each of these ends, you then go to your STM32 boards that receive data on their serial ports and forward to TCP on an Ethernet network connecting them to each other.

Not easy to say but based on the symptoms you describe it certainly looks like you are having one or several timeout issues, likely on the serial sides. I think it won't be easy to help you pinpoint what is exactly wrong with your code without testing it and certainly not if you can't show a complete functional piece.

But what you can improve a lot is the way you debug on the end sides. You can try replacing modbustools with something that gives you more details.

The easiest solution to get additional debugging info is to use pymodbus, you just need to install the library with pip and use the client and server provided with the examples. The only modification you need is to change them to the serial interface commenting and uncommenting a couple of lines. This will give you very useful details for debugging.

If you have a C development environment on your computer better go for libmodbus. This library has a fantastic set of unit tests. Again, you just have to edit the code to set the name of your serial ports and run server and client.

Lastly, I don't know to what extent this might be useful for you but you might want to take a look at SerialPCAP. With this tool, you can tap on an RS-485 bus and see all queries and responses running on it. I imagine you have RS-232, which is point-to-point and will not work with three devices on the bus. If so, you can try port forwarding.

EDIT: Reading your question more carefully I find this sentence particularly troublesome:

...detect one byte received, start timeout, when timeout ends send whatever was received via a TCP socket that was already connected to the server...

Why would you need to introduce this artificial delay? In Modbus, you have very well defined packages that you can identify by the minimum 3.5 frame spacing, is that what you mean by timeout?

Unrelated, but I've also remembered there is a serial forwarder example inluded with pymodbus that might somehow help you (maybe you can use it to emulate one of your boards?).

edited Jun 26 '19 at 07:30

answered Jun 26 '19 at 06:54

Marcos G.

3,371
2
8
16

Thank you for your response. I also thought serial timeout, but nothing of the sort ... I tested it by writing an echo client that responds immediately and it works communicating with the old device and with the new one it works for a while and then stops. I have also tried those .. pymodbus .. libmodbus not yet I'll get on it and try testing it. I also already tapped on the RS485 bus and I can see all the data flowing correctly ... it's the ethernet communication that fails – morcillo Jun 26 '19 at 13:28
And the artificial delay is because this is not a modbus converter it is a transparent converter .. it should be able to receive any protocol up to 256 bytes and pass it forward. I know ... but that's what the device does .. I tested it with modbus because it's a simple protocol and there are many tools – morcillo Jun 26 '19 at 13:28
I also tried laying all my system out here because it may be necessary or someone would like it. But from what I can see, it seems that my entire problem lies in the fact that my device fails to communicate with another device that runs the exact same code .. but works when communicating with other devices, implying that I may be missing something ... but lwip does not detect any errors (doesn't call the error callback) – morcillo Jun 26 '19 at 13:32
Maybe a lwip bug? Otherwise I don't see how it can fail in the very task TCP was intended for... If you have the time you can take a look at pymodbus' forwarder. Are you running on an RT OS or bare metal on the micro? – Marcos G. Jun 26 '19 at 13:51
Another thought: could you run all TCP traffic through a third computer and use Wireshark there to see what's going on? – Marcos G. Jun 26 '19 at 14:13
I'm running freertos on my board. What bugs me is that it works connected to old devices but when i connect 2 new ones it fails. Also I have already tried doing that .. I basically wrote an ethernet forwarder for my devices and they work ... all packets flow correctly ... the only difference is that I connect my boards to the PC instead of connecting them to each other via a switch ... I have written a lot of code to see what happens and I can clearly see that the slave stops receiving and the master stops sending .. like the socket was killed ... nut it works in the old configuration – morcillo Jun 27 '19 at 13:52
I'll try to change the initialization code for them ... maybe try a few different lwip configurations to see if I get anything differently – morcillo Jun 27 '19 at 13:53
Hi ... sorry it took me a long time to respond, I just noticed something weird happening. Basically I do a tcp_write followed by a tcpip_callback_with_block, where I have a callback that basically does tcp_output ... it works without any problems with the old equipment but I just noticed that it returns a ERR_MEM when I do tcp_write between new equipments ... apparently it runs out of memory and does not send the response (the slave) .. I'll edit this condition into the question – morcillo Jul 02 '19 at 17:00
nnnnot exactly ... the exact same code works when other things connect to it ... what I noticed is that, as I've stated before .. I do a tcp_write followed by a tcpip_callback_with_block tha calls tcp_output .... when connected to the old boards everything is fine ... but when I connect to my new boards tcp_write keeps filling up and I've checked and tcp_output is simply not flushing the buffer .. it is not sending, like the pcb was actually disconnected ... but I don' get any error callbacks and the connection is always established – morcillo Jul 03 '19 at 12:24
I see... And do the original devices have [keepalive](https://en.m.wikipedia.org/wiki/Keepalive) implemented? Maybe that's preventing the connection from dropping. – Marcos G. Jul 03 '19 at 16:01
I have tried the exact same code with keep alive and without it .. the original boards I have no idea ... but I tested them with keep alive .. I'll check this next – morcillo Jul 04 '19 at 12:34

lwip board cannot maintain connection to another lwip board

Backstory

How the tests were made

Sources

1 Answers1