1

init question and older discussion here: http://pastebin.com/GzsHhBs3

EDIT/append3:

VTIME seems to work correctly:

while opening the serial port, i set

serial_opts.c_cc[VTIME]=60; //6 seconds

then the code is : http://pastebin.com/W0vPGDBm

I have implemented time measurement for the timeout, and an attempt to reread() from the serial port until MAX_RETRIES (=5) is met.

The timeout seems to work right, and the debug output is (showing the last 2 read() operations, and the bytes that were read ):

SERIAL: DATA read 11 bytes and a total of 12262 .
SERIAL: serDataBuf[12262]=   0x32
SERIAL: serDataBuf[12263]=   0x30
SERIAL: serDataBuf[12264]=   0x32
SERIAL: serDataBuf[12265]=   0x30
SERIAL: serDataBuf[12266]=   0x32
SERIAL: serDataBuf[12267]=   0x30
SERIAL: serDataBuf[12268]=   0x32
SERIAL: serDataBuf[12269]=   0x30
SERIAL: serDataBuf[12270]=   0x32
SERIAL: serDataBuf[12271]=   0x30
SERIAL: serDataBuf[12272]=   0x32
SERIAL: DATA read 5 bytes and a total of 12273 .
SERIAL: serDataBuf[12273]=   0x30
SERIAL: serDataBuf[12274]=   0x32
SERIAL: serDataBuf[12275]=   0x30
SERIAL: serDataBuf[12276]=   0x32
SERIAL: serDataBuf[12277]=   0x30
SERIAL: time diff is tv_sec=5 , tv_usec=996447
SERIAL: No DATA have been read. Timeout @ byte 12278, timeout counter 0.
SERIAL: time diff is tv_sec=5 , tv_usec=999983
SERIAL: No DATA have been read. Timeout @ byte 12278, timeout counter 1.
SERIAL: time diff is tv_sec=5 , tv_usec=999973
SERIAL: No DATA have been read. Timeout @ byte 12278, timeout counter 2.
SERIAL: time diff is tv_sec=5 , tv_usec=999961
SERIAL: No DATA have been read. Timeout @ byte 12278, timeout counter 3.
SERIAL: time diff is tv_sec=5 , tv_usec=999974
SERIAL: No DATA have been read. Timeout @ byte 12278, timeout counter 4.
SERIAL: time diff is tv_sec=5 , tv_usec=999960
SERIAL: No DATA have been read. Timeout @ byte 12278, timeout counter 5.
SERIAL: time diff is tv_sec=5 , tv_usec=999982
SERIAL: No DATA have been read. Timeout @ byte 12278, timeout counter 6.

Note that the last byte receive is a valid ascii char (0x30, corresponding to a char '0'). it also looks like after 6sec/retry*6 retries = 36 secs , I still haven't received any data.

I would start looking into the sender side code, but the code I am porting here (from a kernel 2.4 embedded system, to kernel 3.0.35 emb. system) used to work..... so it must be something on the receiving side.

nass
  • 1,453
  • 1
  • 23
  • 38
  • Who is transmitting? Who is receiving? *"The problem is that somewhere during transmission the select() on the serial port, times out"* So the timeout occurs on the sender's end? You are following a bad coding example. The [Posix guide](http://www.cmrr.umn.edu/~strupp/serial.html#5_1_3) is better. I have read that the Linux receive buffer is 8K bytes, but have never had overrun issues and never bothered to check the source code. The syscall overhead of reading one byte at a time could cause buffer overrun. Easily solved with flow control. – sawdust Jul 03 '13 at 18:49
  • @sawdust hi there. well the time outs occur while receiving. I am only using `select()` there. I'll look into your link and get back to you. – nass Jul 04 '13 at 08:38
  • @sawdust I tested reading from the serial port following the Posix guide inits. I also replaced my `reading 1 byte at a time` with a `while loop` that reads the remaining bytes that it expects, until the remaining bytes become 0. It reads 58 bytes in the beginning (out of 138k) and then it reads one by one bytes (c_cc[VMIN]=1) until 12297. then it times out again. I added a `sleep(10)` statement just to allow chars to arrive. no it reads 4095,4095,4088 to a total of 12278 bytes.. still Very far from 138kBytes... argh! – nass Jul 04 '13 at 13:05
  • @sawdust. Software flow control doesn't help either... – nass Jul 04 '13 at 13:14
  • Software flow control should only be used in canonical mode. You show init code for non-canonical mode. Your only choice for flow control is therefore HW. Your descriptions are confusing because they are ambiguous or conflicting. You need to be more concise and post your code. – sawdust Jul 04 '13 at 18:11
  • @sawdust I didn't know software flow control is only used in canonical mode. I'll update the code with what I'm using. – nass Jul 05 '13 at 09:09
  • Your code is inconsistently checking the return codes from syscalls. Sometimes you do test the return code; sometimes you don't. But you never test for errors, and that is bad coding. In your code an error return would be silently ignored, so program behavior could be unpredictable. Return codes should always be checked! Read the `man` pages for each syscall. The "skip comment line" code (has no error check) and acquiring the mutex (where is it released?) both look suspicious. How are these two routines used? What is the big picture? – sawdust Jul 05 '13 at 20:01
  • Take a look at this [answer](http://stackoverflow.com/questions/17279957/unable-to-read-from-serial-device-after-unplugging-and-replugging-connector/17333175#17333175). If you get rid of `select()` in your code, then you should remove the `O_NDELAY` that you added to the `open()`. – sawdust Jul 05 '13 at 20:02
  • @sawdust. right, new code, removed mutexes (weren't really necessary), removed select() (i fiddle around with c_cc[VMIN] at the moment. The open_com_device() fn doesn't have all the necessary error checking, but it has been working - I have verified that the settings are updated as expected with `stty`. Also I appended (I should have done that earlier) some debug output – nass Jul 09 '13 at 15:53
  • Might be time to do some sanity checking. Before each `read()`, call `gettimeofday()` and save the timestamp. When your program gets back zero bytes and thinks timeout, make another call to `gettimeofday()` and compare timestamps. (Read the [glibc manual](http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_19.html#SEC315) on how to compare two `struct timeval`s). Maybe the `read()` is not actually waiting for the VTIME. Try allowing for a few "timeouts" before the program aborts. – sawdust Jul 11 '13 at 01:00
  • @sawdust. timing of read() seems to work as expected. I also run the code in the pc , instead of the embedded device, just to do a read(). Well, it halts on at the very same byte ?!. Now this is odd. – nass Jul 12 '13 at 14:27
  • *"...seems to work as expected"* - Please do not provide an executive summary if you expect meaningful assistance. (1) What was the delta time from the `read()` call to the zero return? (2) Did you perform retrying the read a few times before finally declaring timeout? – sawdust Jul 12 '13 at 23:29
  • @sawdust. I think I have already answered both your questions. (1) for VTIME=6 seconds & VMIN =1 byte, the delta time is 5.99 seconds. This is expected. (2) each timeout increments a counter and retries the read() until another timeout occurs. After an aggregate of 6 retries and 36seconds later, the timeout counter reaches a ceiling values and the readComData fn exits without having acquired anymore data. – nass Jul 15 '13 at 10:51

1 Answers1

1

in a desperate attempt at my office, we modified the host application (running on windows, build on MS Visual Studio, as a .net application). So we created a small C++ serial port control application to just bypass the .net and voila! Without altering the code on the embedded system side, I can now read the full data I expect (from the embedded device side)!

I am not going to blame .net since the problem started showing up after I ported the embedded sytem's code from an older device ARM9TDMI-ARMv4T (running linux with kernel 2.4), to the newer freescale cortex A9 imx6q sabrelite.

However, I will note here that moving away from .net code, the C++ serial port control that we wrote worked both with the old device and the new one.

So the code above should work for reading from the serial port in linux.

I wonder if there is something with the serial port driver of the boundary-devices linux kernel 3.0.35 that is used by the yocto project (and which I am running on the board right now). If somebody knows anything about serial port problems with the sabrelite, pleasee share them. Thanks!

nass
  • 1,453
  • 1
  • 23
  • 38