0

Hi I have data stored in chunk of n number of threads. the size of file is 102kb, so I am trying to lock the shared resource i.e. the file and then when I write the first chunk i release the lock, but when it comes for the next chink from the second thread, instead of file to continue from where it left it starts to write the chunk on top if it...

so the 102 kb file becomes 51 for two threads each having chunk of 51kb

here is the piece of code.

for th in threads:
    th.join()

for th in threads:
    lock.acquire()
    with open(fileName, 'w+') as fh:
        fh.write(th.data)
    lock.release()

I am even using mode w+ still instead of appending its overwriting..

update

def main(url=None, splitBy=2):
    start_time = time.time()
    if not url:
        print "Please Enter some url to begin download."
        return

    fileName = url.split('/')[-1]
    sizeInBytes = requests.head(url, headers={'Accept-Encoding': 'identity'}).headers.get('content-length', None)
    # if os.path.exists(fileName):
    #   if int(sizeInBytes) == os.path.getsize(fileName):
    #       raise SystemExit("File already exists.")

    print "%s bytes to download." % sizeInBytes
    if not sizeInBytes:
        print "Size cannot be determined."
        return
    threads = []
    lock = threading.Lock()

    byteRanges = buildRange(int(sizeInBytes), splitBy)
    for idx in range(splitBy):
        bufTh = SplitBufferThread(url, byteRanges[idx])
        bufTh.daemon = True
        bufTh.start()
        threads.append(bufTh)
    print "--- %s seconds ---" % str(time.time() - start_time)


    for i, th in enumerate(threads):
        th.join()
        lock.acquire()


        with open(fileName, 'a') as fh:
            fh.write(th.data)
            if i == len(threads) - 1:
                fh.seek(0, 0)
                fh.flush()
        lock.release()

Update 2

I have totally removed the extra threads list, just using the join() method does the magic , but how does the thread wait for one chunk to finish writing is it using with waits for one thread.data to be written and then next one gets to start appending ??

 def main(url=None, splitBy=6):
    if not url:
        print "Please Enter some url to begin download."
        return

    fileName = url.split('/')[-1]
    sizeInBytes = requests.head(url, headers={'Accept-Encoding': 'identity'}).headers.get('content-length', None)
    if os.path.exists(fileName):
        if int(sizeInBytes) == os.path.getsize(fileName):
            ask = raw_input('[YES]')
            if not ask or ask.lower() in ['y', 'yes']:
                os.remove(fileName)
            else:
                raise SystemExit("File already exists.")

    start_time = time.time()
    print "%s bytes to download." % sizeInBytes
    if not sizeInBytes:
        print "Size cannot be determined."
        return

    byteRanges = buildRange(int(sizeInBytes), splitBy)
    for idx in range(splitBy):
        bufTh = SplitBufferThread(url, byteRanges[idx])
        bufTh.daemon = True
        bufTh.start()
        with open(fileName, 'a+') as fh:
            bufTh.join()
            fh.write(bufTh.data)

    print "--- %s seconds ---" % str(time.time() - start_time)


    print "Finished Writing file %s" % fileName
Ciasto piekarz
  • 7,853
  • 18
  • 101
  • 197
  • Try changing the mode to "a" instead of "w+". "w+" does not append to the file, but overwrites it. See http://stackoverflow.com/questions/16208206/confused-by-python-file-mode-w – Colin Atkinson Jul 06 '14 at 16:48
  • sorry first mistake was using `w+` but even if I try with `a` it continues appending the file , so i tried to set `fh.seek(0, 0)` but it didn't work it continues to append bytes – Ciasto piekarz Jul 06 '14 at 17:08
  • 1
    Wait, what are you trying to achieve? Do you want it to append to the file or overwrite it? – Colin Atkinson Jul 06 '14 at 17:18
  • append. but once all the chunks are appended I want to set the pointer to beginning, so that the on restarting to download the same file should not append instead start from the beginning. – Ciasto piekarz Jul 06 '14 at 17:20
  • @ColinAtkinson I have posted an update. but I want to completely bypass storing chunks to list... instead use the first loop where i start the thread to use to write the file to disk as soon as thread finishes it's chunk. – Ciasto piekarz Jul 06 '14 at 17:23
  • So, you seek(0,0) and wonder why file keeps getting overwritten? Notice, in your for loop, you are in a single thread and there is no reason obtain a lock or open the file multiple times. Just open the file before the for loop and write the results of each thread. – tdelaney Jul 06 '14 at 17:32
  • removed lock after downloading the file , if i redownload it the file only gets appended. – Ciasto piekarz Jul 06 '14 at 17:39
  • Setting the file pointer position at the end is pointless; it only persists as long as you have that file open. The easiest way to do what you are requesting is to clear the file when it is first opened. This will remove the old data, and you will then be free to write your new content to. – Colin Atkinson Jul 06 '14 at 17:51
  • so because I am using `with` after `fh.write` is done is it pointless to `fh.seek` after `fh.write` if I do not use the with instead would that make sense ? however `os.remove` if file exist on disk does make sense. – Ciasto piekarz Jul 06 '14 at 18:04
  • The position of the file pointer does not persist between sessions. So, once the program exits, regardless of how you open the file, the pointer will be reset when the file is opened again. You can use `os.remove`, but my personal recommendation would be to open the file as "w" and close it, which will empty it. Then, like you did in the update, open it with "a" and write to it. – Colin Atkinson Jul 06 '14 at 18:21

1 Answers1

0

``w+'' Open for reading and writing. The file is created if it does not exist, otherwise it is truncated. The stream is positioned at the beginning of the file.

try with "a+"

Emanuele Paolini
  • 9,912
  • 3
  • 38
  • 64