0

How do I truncate a csv log file that is being used as std out pipe destination from another process without generating a _csv.Error: line contains NULL byte error?

I have one process running rtlamr > log/readings.txt that is piping radio signal data to readings.txt. I don't think it matters what is piping to the file--any long-running pipe process will do.

I have a file watcher using watchdog (Python file watcher) on that file, which triggers a function when the file is changed. The function read the files and updates a database.

Then I try to truncate readings.txt so that it doesn't grow infinitely (or back it up).

file = open(dir_path+'/log/readings.txt', "w")
file.truncate()
file.close()

This corrupts readings.txt and generates the error (the start of the file contains garbage characters).

I tried moving the file instead of truncating it, in the hopes that rtlamr will recreate a fresh file, but that only has the effect of stopping the pipe.

EDIT I noticed that the charset changes from us-ascii to binary but attempting to truncate the file with file = open(dir_path+'/log/readings.log', "w",encoding="us-ascii") does not do anything.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
metalaureate
  • 7,572
  • 9
  • 54
  • 93
  • What's the `watchdog` that you are referring to? Doesn't look like https://linux.die.net/man/8/watchdog – ivan_pozdeev Feb 01 '20 at 23:41
  • From which end does the code that you gave run? Is the file still open by the other process at this moment? – ivan_pozdeev Feb 01 '20 at 23:44
  • It's https://pypi.org/project/watchdog/ The code I gave runs when the watchdog triggers an event that the `readings.txt` has changed by `rtlamr`.`rtlamr` is a black box utility that appends an update every 3 seconds. I don't know much about the underlying pipe file management behavior. – metalaureate Feb 01 '20 at 23:49
  • Do you really need the data to be persistently stored? A FIFO file (or some other implementation of a buffered pipe) looks like a better fit. – ivan_pozdeev Feb 01 '20 at 23:53
  • I don't need the data to be stored in the log file--but I do need to persist it in a database and writing to log file using a pipe is the only way I can get data out of out rtlamr. I am a newbie and kinda making it up as I go along and don't know the better patterns. – metalaureate Feb 01 '20 at 23:55

1 Answers1

2

If you truncate a file1 while another process has it open in w mode, that process will continue to write to the same offsets, making the file sparse. Low offsets will thus be read as 0s.

As per x11 - Concurrent writing to a log file from many processes - Unix & Linux Stack Exchange and Can two Unix processes simultaneous write to different positions in a single file?, each process that has a file open has its own offset in it, and a ftruncate() doesn't change that.

If you want the other process to react to truncation, it needs to have it open in a mode.


Your approach has principal bugs, too. E.g. it's not atomic: you may (=will, eventually) truncate the file after the producer has added data but before you have read it so it would get lost.

Consider using dedicated data buffering utilities instead like buffer or pv as per Add a big buffer to a pipe between two commands.


1Which is superfluous because open(mode='w') already does that. Either truncate or reopen, no need to do both.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152