1

How can I merge files in Swift / iOS ? The FileManager can move and copy items but I've seen nothing about merging files. I'd like to have something like

FileManager.default.merge(files: [URL], to location: URL) throws

Files can potentially be big, so I'd rather avoid having to pass their data in memory.

=== here is my own in memory merge:

let data = NSMutableData()
files.forEach({ partLocation in
  guard let partData = NSData(contentsOf: partLocation) else { return }
  data.append(partData as Data)
  do {
    try FileManager.default.removeItem(at: partLocation)
  } catch {
    print("error \(error)")
  }
})
data.write(to: destination, atomically: true)
Guig
  • 9,891
  • 7
  • 64
  • 126
  • 1
    There's always the option of cooking your own. Read from each file in succession while writing to the destination. – Alexander Jan 15 '17 at 01:50
  • 1
    But is there a thing like "appending to the destination"? If possible I'd like to avoid reading the entire file and then writing to limit memory usage – Guig Jan 15 '17 at 01:58
  • 1
    That's why you have `FileHandle.readData(ofLength: Int)`. You can make a loop that reads something like 4kb chunks, and writes them to the output. Making the buffer larger would increase performance by minimizing kernel context switching overhead, but would use more memory – Alexander Jan 15 '17 at 02:07
  • 1
    Thanks. And what can I use to append a chunk of data to an already written file? – Guig Jan 15 '17 at 02:12
  • 2
    `FileHandle.write(Data)` :) – Alexander Jan 15 '17 at 02:22
  • get two file’s data,and apend them. – aircraft Jan 15 '17 at 02:57

2 Answers2

2

Here is my own solution (thanks @Alexander for the guidance)

extension FileManager {
  func merge(files: [URL], to destination: URL, chunkSize: Int = 1000000) throws {
    try FileManager.default.createFile(atPath: destination.path, contents: nil, attributes: nil)
    let writer = try FileHandle(forWritingTo: destination)
    try files.forEach({ partLocation in
      let reader = try FileHandle(forReadingFrom: partLocation)
      var data = reader.readData(ofLength: chunkSize)
      while data.count > 0 {
        writer.write(data)
        data = reader.readData(ofLength: chunkSize)
      }
      reader.closeFile()
    })
    writer.closeFile()
  }
}
Guig
  • 9,891
  • 7
  • 64
  • 126
  • merging is working fine , any idea how to handle memory issue while merging large file ? – sujay Feb 07 '18 at 07:38
  • chunkSize controls how much memory is used – Guig Feb 07 '18 at 22:51
  • and how can we clear it , when we are merging large file .?. – sujay Feb 08 '18 at 12:26
  • What do you mean clear it? I think data reads in memory a part of the size of chunkSize, write it to disk, and then get freed – Guig Feb 14 '18 at 00:34
  • yes , while reading from file and writing memory consumption was going very high . I used to inputstream/outputstream to overcome that issue – sujay Feb 15 '18 at 10:35
2
func merge(files: [URL], to destination: URL, chunkSize: Int = 100000000)  {
        for partLocation in files {
            // create a stream that reads the data above
            let stream: InputStream
            stream = InputStream.init(url: partLocation)!
            // begin reading
            stream.open()
            let buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: chunkSize)
            //            var writeData : Data = Data()
            while stream.hasBytesAvailable {
                let read = stream.read(buffer, maxLength: chunkSize)  

                var writeData:Data = Data()
                writeData.append(buffer, count: read)    enter code here
                if let outputStream = OutputStream(url: destination, append: true) {
                    outputStream.open()
                    writeData.withUnsafeBytes { outputStream.write($0, maxLength: writeData.count) }
                    outputStream.close()
                    writeData.removeAll()
                }
            }
            stream.close()
            buffer.deallocate(capacity: chunkSize)

        }
    } 
sujay
  • 1,845
  • 9
  • 34
  • 55
  • I'm curious why you chose to use `writeData.withUnsafeBytes` here instead of simply `outputStream.write(buffer, maxLength: writeData.count)`? It seems like the extra write to Data() isn't required? – lepolt Mar 20 '19 at 10:01