3

I'm finishing up my app by running it through Instruments as well as stressing it with large amounts of data. The Instruments tests go fine, but the stress test is where I'm having issues. Without getting into too much detail, I'm giving my app increasing amounts of Core Data events with which it needs to extrapolate data, make graphs, and present locations on a MKMapView instance. I started small and increased to 56000 events, which it handled fine wihtout any leaks or memory warnings (and I was quite proud of it for handling it all).

My app implements the Dropbox API to allow for uploading and downloading templates and data for sync purposes. Files uploaded from my app are converted from Core Data to an NSDictionary, then to NSData. I create a temporary folder for the data, then upload that file to Dropbox, which works fine.....normally. If I try to upload my data file with 56000 events, then it crashes. I've logged it and watched as the data is converted. It reaches the last event with no issues, but when it's supposed to start uploading to Dropbox, the app crashes and I cannot for the life of me figure out why. I see memory warnings pop up on my log. Typically, it will go Level=1, Level=2, Level=1, Level=2, then crash, which confuses me as it never reaches Level=3.

The majority of the information I've found is in my edit at the botton. Below is some relevant code:

- (void)uploadSurveys:(NSDictionary *)dict {
    NSArray *templateArray = [dict objectForKey:@"templates"];
    NSArray *dataArray = [dict objectForKey:@"data"];
    NSString *filename;
    NSLog(@"upload called");
    if ([templateArray count] || [dataArray count]) {
        if ([templateArray count]) {
            // irrelevent code;
        }
        if ([dataArray count]) {
            SurveyData *survey;
            for (int i = 0; i < [dataArray count]; i++) {
                BOOL matchExists = NO;
                // ...... code to make sure no file exists in dropbox folder and creates new version if necessary;

                dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
                NSData *data = [self convertSurvey:survey];
                dispatch_async(dispatch_get_main_queue(), ^{
                    [self uploadData:data withFilename:filename];
                    NSLog(@"converted and uploading");
                });
            });
        }
    }
}

[self convertSurvey:survey] simply converts my Core Data object to NSData.

- (void)uploadData:(NSData *)data withFilename:(NSString *)filename {
    NSFileManager *manager = [NSFileManager defaultManager];
    NSString *pathComponent = [NSString stringWithFormat:@"tempData.%@", filename];
    NSString *path = [NSTemporaryDirectory() stringByAppendingPathComponent:pathComponent];
    if ([manager createFileAtPath:path contents:data attributes:nil]) {
        [self.restClient uploadFile:filename toPath:[NSString stringWithFormat:@"/%@", currentSearch] fromPath:path];
        NSLog(@"uploading data");
    }
}

Any help would be much appreicated and I thoroughly thank you in advance. I'm just trying to figure out if I'm either taking the wrong approach for large files or if it's simply not allowed. If I have to split the files, that is fine, but I'd prefer to know what is going on that prevents my app from performing this action before I try to make a workaround. Thank you again.

UPDATE: As this issue is now the only hinderance to the release of my application, I'm adding a bounty to this question to hopefully get a solution or workaround. It will be up for a week, after which given time I am most likely going to just split up the files as they upload to ensure that this apparent size limit is not reached. This approach is not ideal, which is why a better solution is very welcomed, but is my backup plan if this fails to bring in something more convenient.

EDIT: It appears that NSTemporaryDirectory plays no part in this at all. Here is the new situation. As you can see in the code above, NSData *data = [self convertSurvey:survey]; is called in a secondary thread (which isn't the issue). I have been logging the objects created and knew that they had reached the last one, but never thought to check and see if the NSData file was returned. Turns out, it isn't. In short, I convert all my Core Data objects into arrays and place them into a dictionary (only for the relevant survey/data to be converted). This does indeed work and the dictionary is created. Then I create an NSData file using NSData *data = [NSKeyedArchiver archivedDataWithRootObject:d]; where d is my dictionary. Directly after that, I call return data; to set the value for NSData *data = [self convertSurvey:survey];. This being the case, it appears the NSData or NSKeyedArchiver are at fault here. According to the Apple documentation:

Using 32-bit Cocoa, the size of the data is subject to a theoretical 2GB limit (in practice, because memory will be used by other objects this limit will be smaller); using 64-bit Cocoa, the size of the data is subject to a theoretical limit of about 8EB (in practice, the limit should not be a factor).

I have checked the file sizes in small increments to see where the failure occurs. I have successfully gotten 48.2MB of data through, but not 51.5MB, which leads me to believe that the issue occurs around 50MB, well below the theoretical limit for NSData (unless there is a discrepancy between iOS and OS X in that respect).

Hopefully this new information will help to solve this problem

justin
  • 5,811
  • 3
  • 29
  • 32
  • It's not clear from your description exactly when it crashes. Does it crash during the `createFileAtPath:contents:attributes:` call, or does that succeed and it crashes during `uploadFile:toPath:fromPath:`? If the latter, what does that code look like? – Anomie Aug 10 '11 at 20:22
  • @Anomie, I'm sorry for not being more clear. It crashes during/at the end of `createFileAtPath:contents:attributes:`, before `uploadFile:toPath:fromPath:` is called, but only after a certain size has been reached. Anything before then converts and uploads with any issues. Hopefully that clears up the question more for you. If not, just say so and I will do my best to give you the information needed – justin Aug 10 '11 at 21:47
  • @Anomie, I take that back. It crashes at a separate point in the code. I will update the question to include what I've found out. – justin Aug 10 '11 at 22:31

2 Answers2

2

The 2 GB limit for NSData is completely theoretical on iOS, even the iPhone 4 only has 512 MB of RAM and iOS (unlike Mac OS X) cannot swap, so if your physical RAM is full, you crash (or your app is terminated before that).

The 50 MB NSData object alone is already very large and it's not the only object you have in memory – given that you convert the data from Core Data to a dictionary representation and then to NSData, you probably consume at least twice as much memory (likely more). The system and other apps also need RAM, so you're probably reaching a limit.

Try running your app in Instruments to see how much memory you actually consume.

To reduce your peak memory usage, you have a couple of options that largely depend on your data model:

  • As Jason Foreman suggested in his answer, try to avoid having your whole file in memory at once. Using NSFileHandle, you can write chunks of data to a file without needing to have the whole data in memory at once. Of course, this requires that you prepare your data accordingly, so that it can be split into chunks. A higher-level approach might be to serialize your data into an XML format that you could write out as a stream. If your data format is very simple, something like CSV might also work.

  • Don't use NSData for uploading to Dropbox. Write your data to a file instead (see above) and point the Dropbox SDK to that file. The Dropbox SDK makes it pretty easy to do so (DBRestClient has an uploadFile:toPath:fromPath: method).

  • If your data model makes it difficult to take a streaming approach, try to segment the data into more manageable parts. You could then use your old method of serializing dictionaries, just with multiple files.

  • Be careful with Core Data's memory usage. Try to re-fault objects using refreshObject:mergeChanges: if possible to break cyclic references within your data (see the Core Data Programming Guide for details).

  • Avoid using autorelease pools while you're in a long-running loop or create a separate NSAutoreleasePool that gets drained in each iteration of your loop.

Community
  • 1
  • 1
omz
  • 53,243
  • 5
  • 129
  • 141
  • I believe you hit it on the dot. I profiled the app and watched its memory usage during the upload process. At the instance of creating the `NSData` object, the total real memory rocketed to 289MB (352MB for the virtual memory), much higher than the 15MB it initially starts with. That high amonut I'm sure is taking the physical RAM (this is on an iPad 2 if that makes any difference). With this new information, do you feel this is definitely the case? And if so, would the best solution just be to split the files up? And how would I guarantee the memory has been freed before the next step? – justin Aug 11 '11 at 15:35
  • I've added a couple of suggestions to my answer. – omz Aug 11 '11 at 18:54
  • This is just beautiful, haha. Thank you very much. I believe this is going to be the approach I will take. My only question is would using a file handle or streaming XML/CSV be the best approach? Or is it the same thing? And would you happen to know of any tutorials I could follow to base my code off of? This is one area where I know I'm a tad out of my league, otherwise I would just get right into it. Thank you again. I'm going to go ahead and mark this as the correct solution and award you the bounty, too. But any advice on how to go about the streaming would be greatly appreciated – justin Aug 11 '11 at 19:11
  • 1
    You can write out XML or CSV with an `NSFileHandle` (convert the `NSStrings` you would write to `NSData` with `dataUsingEncoding:` first). Whether CSV is an option depends on your model, rule of thumb: If you could express your data in a single Excel sheet, CSV is the easiest way to go. For nested data structures, you would typically use XML, using nested loops that iterate over your data, appending XML tags to your output stream. There are also libraries that can help you with that, but stream-based XML libraries are usually pretty low-level and you don't want a DOM-based library here. – omz Aug 11 '11 at 19:23
  • Btw, when you use `dataUsingEncoding:` and similar methods in a loop, you might want to allocate an extra `NSAutoreleasePool` within that loop, otherwise you'll get the same effect as before because the default autorelease pool wouldn't be drained until your method returns, with the result that you effectively still have all your data in memory... – omz Aug 11 '11 at 19:27
  • That sounds like a good plan. My model does have some nested objects, so it seems that XML will be the option I should take. I will look into creating an XML stream with `NSFileHandle` and, I'm assuming, upload them in chunks. I will also make sure to add an internal `NSAutoreleasePool` to release memory as the stream is uploaded. I truly do appreciate all the help. This was the only part of my application I just couldn't figure out, so this is a life-saver. If I happen to run into any confusion while coding this in, would you mind if I referred to you? I don't want to be a hassle – justin Aug 11 '11 at 21:18
  • Sure, I can't promise that I'll be able to help, but I'll try. – omz Aug 11 '11 at 21:56
  • I certainly appreciate it. Hopefully it won't come to that, and I certainly don't expect anything out of you if it does. It's just nice to know that there's potential help if things go wrong – justin Aug 11 '11 at 22:05
  • Apparently moving to a chat doesn't do what I thought (setting up a private discussion through the inbox) in case you got an email or alert about that. Anyway, I was really hoping it wouldn't come to this, but I'm stuck. I've been looking at NSFileHandle and trying to get the basics of what's going on. I've set up a simple experiment to create a path, assign it to a handle, then write data to it. This works fine. It's the uploading to DB that is getting me. Let's say I have 10 strings that I write to a file. On string 2, I begin uploading that file to DB, while continuing to append data – justin Aug 12 '11 at 19:41
  • Even though the file holds all the data when it's done, the file uploaded is only those first two strings. I've tried increasing the amount to much larger amount in case the strings were uploaded faster than I wrote to the file. But I get the same result. Which leads me to wonder if I'm doing something stupid here. It's my understanding that I should start uploading at a certain point so I can append data as I go, deleting the old chunks of the file while creating new chunks. Does this sound ok or am I doing something horribly wrong? Again, if this is too much hassle, don't feel obligated – justin Aug 12 '11 at 19:45
  • You need to write the whole file to disk before you start uploading. Don't try to truncate the file while you're writing to it and simultaneously uploading it to Dropbox – that will end in a mess, I can promise you that. The point is not to keep the file as small as possible, but to keep that data out of RAM. You usually have enough flash memory to write relatively big files, but RAM is very scarce. – omz Aug 12 '11 at 19:58
  • Oh, I see. Writing the file doesn't charge usage to the internal RAM, just the memory of the device itself. That certainly makes life a lot easier. Again, I greatly thank you for the assistance. – justin Aug 12 '11 at 20:20
1

A way to work around this type of memory pressure is to build your APIs using streams, both for writing your converted data to a file on disk and also for uploading the data to a web service.

During conversion you can use an NSOutputStream to write chunks of data to the file to avoid keeping an large chunk of data in memory at one time. Then, NSMutableURLRequest can accept an NSStream for the body instead of an NSData, so you should create an NSInputStream to read from your file back from disk and upload it.

Using streams in this way will ensure you never have 50+ MB of data loaded and should avoid the memory warnings you are seeing.

Jason Foreman
  • 2,146
  • 18
  • 15
  • Will this approach work using the Dropbox API? I believe this approach would be ideal given a custom URL request, but since I'm a bit unfamiliar with the networking side of programming. So is this an approach for a custom URL request, or is it a generalized approach that can actually be used with the [API provided by Dropbox](https://www.dropbox.com/developers/docs)? – justin Aug 11 '11 at 18:13
  • It seems that the Dropbox SDK for iOS does this type of streaming internally, so I imagine that your particular problem comes during the generation of the large NSData blob that represents your converted data. Converting in chunks and using an NSOutputStream to write that to a file should solve that problem for you. – Jason Foreman Aug 11 '11 at 18:53
  • Thank you very much for the advice. Between your suggestion and that from omz, I believe I should be all set. I think I'm going to try omz's suggestion, but you have certainly helped a lot. I truly appreciate it – justin Aug 11 '11 at 19:08