0

I have a large XML file, ~30 MB.

Every now and then I need to update some of the values. I am using element tree module to modify the XML. I am currently fetching the entire file, updating it and then placing it again. SO there is ~60 MB of data transfer every time. Is there a way I update the file remotely? I am using the following code to update the file.

import xml.etree.ElementTree as ET

tree = ET.parse("feed.xml")
root = tree.getroot()

skus = ["RUSSE20924","PSJAI22443"]
qtys = [2,3]

for child in root:
    sku = child.find("Product_Code").text.encode("utf-8")
    if sku in skus:
        print "found"
        i = skus.index(sku)
        child.find("Quantity").text = str(qtys[i])
        child.set('updated', 'yes')

tree.write("feed.xml")
nish
  • 6,952
  • 18
  • 74
  • 128
  • 6
    Why you don't execute the script directly on the server? – enrico.bacis Aug 09 '14 at 08:02
  • If you cannot run the script on server, you may split the file, may be? And update only the portion with your key? Kind of sharding? – dmitry_romanov Aug 09 '14 at 08:05
  • @enrico.bacis Yes, that is an option. But I would like to call scripts from where my app is hosted, the other server is just an FTP server. – nish Aug 09 '14 at 08:06
  • @dmitry_romanov: To split, I think I'll have to fetch first. Any other way of splitting? – nish Aug 09 '14 at 08:08
  • @nish He means split it and upload it splitted, then fetch only the part you have to change. If you don't already know the file content (in that case you could simply modify it and re-upload it) and the server cannot execute code it's pretty hard to make substitutions without downloading. – enrico.bacis Aug 09 '14 at 08:08
  • How do you access the remote file? Using which protocol and operating system? – Ferdinand Beyer Aug 09 '14 at 08:15
  • @enrico.bacis But except for exactly replacements, every bit after will be modified. – user2864740 Aug 09 '14 at 08:15
  • I would consider using a network file-system - that may be sufficiently "good", and it will be much simpler where supported. – user2864740 Aug 09 '14 at 08:16

1 Answers1

6

Modifying a file directly via FTP without uploading the entire thing is not possible except when appending to a file.

The reason is that there are only three commands in FTP that actually modify a file (Source):

  • APPE: Appends to a file
  • STOR: Uploads a file
  • STOU: Creates a new file on the server with a unique name

What you could do

Track changes

Cache the remote file locally and track changes to the file using the MDTM command.

Pros:

  • Will half the required data transfer in many cases.
  • Hardly requires any change to existing code.
  • Almost zero overhead.

Cons:

  • Other clients will have to download the entire thing every time something changes
    (no change from current situation)

Split up into several files

Split up your XML into several files. (One per product code?)
This way you only have to download the data that you actually need.

Pros:

  • Less data to transfer
  • Allows all scripts that access the data to only download what they need
  • Combinable with suggestion #1

Cons:

  • All existing code has to be adapted
  • Additional overhead when downloading or updating all the data

Switch to a delta-sync protocol

If the storage server supports it switching to a delta synchronization protocol like rsync would help a lot because these only transmit the changes (with little overhead).

Pros:

  • Less data transfer
  • Requires little change to existing code

Cons:

  • Might not be available

Do it remotely

You already pointed out that you can't but it still would be the best solution.

What won't help

Switch to a network filesystem

As somebody in the comments already pointed out switching to a network file system (like NFS or CIFS/SMB) would not really help because you cannot actually change parts of the file unless the new data has the exact same length.

What to do

Unless you can do delta synchronization I'd suggest to implement some caching on the client side first and if it doesn't help enough to then split up your files.

Community
  • 1
  • 1
ntninja
  • 1,204
  • 16
  • 20
  • Note: I also had another idea but later realized its garbage so I'll only link to it in the comments: http://pastebin.com/rFczuBVv – ntninja Aug 09 '14 at 13:41