2

I try to use python to handle text replace problem. There is a file of Little-endian UTF-16 format, I want to replace the ip address in this file. First, I read this file by line, then replace the target string, last, I write the new string to the file. But with multi thread operate this file, the file will be garbled. Here is my code.

import re
import codecs 
import time
import thread
import fcntl

ip = "10.200.0.1" 
searchText = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" 

def replaceFileText(fileName,searchText,replaceText,encoding):
    lines = []
    with codecs.open(fileName,"r",encoding) as file:
        fcntl.flock(file,fcntl.LOCK_EX)
        for line in file:
            lines.append(re.sub(searchText,replaceText,line))
        fcntl.flock(file,fcntl.LOCK_UN)

    with codecs.open(fileName,"w",encoding) as file:
        fcntl.flock(file,fcntl.LOCK_EX)
        for line in lines:
            file.write(line)
        fcntl.flock(file,fcntl.LOCK_UN)

def start():
    replaceFileText("rdpzhitong.rdp",searchText,ip,"utf-16-le")                                                                 
    thread.exit_thread()

def test(number):
    for n in range(number):
        thread.start_new_thread(start,())
        time.sleep(1)

test(20) 

I can't understand why the file is garbled, I have use the fcntl flock to keep the read/write sequence, where is the problem?

liuan
  • 299
  • 1
  • 3
  • 9

2 Answers2

5

It's garbled because an fcntl lock is owned by a process, not by a thread, so a process cannot use fcntl to serialize its own access. See this answer, for example.

You'll need to use a threading construct like a Lock instead.

Community
  • 1
  • 1
pilcrow
  • 56,591
  • 13
  • 94
  • 135
  • 1
    +1 That is to say fcntl is at process level, shared by all threads in process, and Lock is at thread level, independent of other threads. Is these locks implementation of python independent of OS? That is, does they behave the same in any OS(Windows, Unix-like, etc)? – lulyon Jul 20 '13 at 06:07
  • @lulyon, yes, more-or-less. The `Lock` is a python threading construct, and should work on any platform that supports python. `fcntl` is a UNIX-ism, [not available on py on Windows](http://stackoverflow.com/q/1422368/132382). – pilcrow Jul 20 '13 at 13:05
0

I imagine it's garbled cause you lock it after you open it. In this situation the seek position might be wrong.

BTW the threading in Python is not so useful in this context (look around for the python GIL problem). I suggest you, to maximize performance in a task like that, to use the multiprocessing module and to change the logic using queues/pipes, making worker processes which analyze data and the main process responsible of I/O from input and output files.

Paolo Casciello
  • 7,982
  • 1
  • 43
  • 42