Blocks, Operations, and Retain Cycles

There's been some great discussions in the iOS community lately about the pitfalls of Objective-C and things to watch out for while developing iOS apps.  One of our projects at Mutual Mobile recently encountered a very difficult to diagnose issue that I wanted to describe so that hopefully others can avoid this happening to them.  The issue involved leaking images which resulted in a memory pressure warning and subsequent crash.  We knew what the symptoms were and how to reproduce it, but the root cause of the leak was extremely hard to pin down.  It used to be the case that most memory management issues were the result of programmer oversight.  But as you will see below, this issue would be extremely easy to overlook which is why I felt it was important to share a detailed explanation of it here.  
 
You can't start off a conversation about memory management anymore without mentioning Automatic Reference Counting (ARC).  Apple knew that managing memory is a big deal and that a lot of people in the iOS community were doing it wrong.  At WWDC in 2011, they mentioned that around 90% of the crashes on the app store were due to memory management issues.  ARC was their attempt to combat that.  Essentially ARC synthesizes calls to retain and release for you so that you the developer can focus on actually building your app.  It seemed too good to be true at the time, but I think I can safely say now that ARC has been a huge success.  It's now extremely easy for us to manage memory in Objective-C because, well, we really don't have to do anything.  As long as we follow the simple conventions the compiler takes care of the rest.  It really does work quite well.
 
There are a few gotchas with ARC though.  For starters, ARC doesn't really apply to C references (though it can help with converting from C to Objective-C, but that's another topic).  You do have to deal with C in iOS projects, because several libraries like CoreGraphics have a C interface.  Instead of method calls to retain and release, C libraries include retain and release functions, such as CGContextRelease(context c), to manage their memory.  ARC does not synthesize those for you, so you still have to call them.  CoreGraphics can be a tough framework to use, so my first thought was that the issue would be somewhere in this layer.  But it didn't take long to discover that this wasn't the case.  All it took was reading the code.  All of the calls to retain and release were balanced out correctly where the images were created.  Here's an example below taken from the method that we thought may have been at fault:
 
    CGContextRef context = CGBitmapContextCreate(NULL, target_w, target_h, 80, rgb, bmi);
    
    CGColorSpaceRelease(rgb);
    
    UIImage *pdfImage = nil;
    
    if (context != NULL) {
        CGContextDrawPDFPage(context, page);
        
        CGImageRef imageRef = CGBitmapContextCreateImage(context);
        CGContextRelease(context);
 
        pdfImage = [UIImage imageWithCGImage:imageRef scale:screenScale orientation:UIImageOrientationUp];
        CGImageRelease(imageRef);
    } else {
       CGContextRelease(context);
    }
 
So as you can see, at the top of the method the context is being created, so by convention it has a retain count of +1.  Then the context is used to draw an pdf into it, which is then copied out to an image so that it can be returned by the method.  The image ref also has a retain count of +1, because that's how the creation convention works.  It is then the developer's responsibility to release those references, which you can see is done in this example.  So far so good.  We knew from instruments that the image context was being leaked, but we still didn't know how or why.  But since we could tell both from our own inspection, and reading Apple's sample code for this exact use case, that it was being done correct, we went back to the drawing board.

The other main gotcha that remains with ARC is the retain cycle.  A retain cycle can take a few forms, but it typically means that object A retains object B, and object B retains object A, but nothing else retains object A or B.  They are still holding on to each other, so they never get dealloc'd, but nothing else is holding on to them so they both leak.  A lot of the time this can happen with a block, where the block retains the thing that created the block, and never gets released, so that the thing that created the block never gets released either.  That's clearly a problem.  It's also a problem that is tough to solve.  The static analyzer isn't well equipped to point these out to you, and Instruments isn't super effective at nailing them down either.  It will tell you something is leaking, but that's about it.  You do get a stack trace for the leak (or cycle if it detects one), but it's still up to you to pin down exactly what is causing the cycle.  You have to have a really strong understanding of how this stuff works and then do the detective work to parse through the code to find out what the cause could be.

In this case the team did understand retain cycles and did a lot to prevent them.  The view controller in question was responsible for showing all of the images that we are dealing with here.  That view controller creates dozens of blocks to perform image conversion operations from a PDF to an image.  The concern initially was that perhaps one of these blocks was holding onto the view controller, which holds the view hierarchy...and so maybe the view hierarchy isn't ever getting released.  There were indeed a few places where a block could have been "capturing" the view controller, and we took steps to prevent those.  Here's how you typically address that issue, by changing any reference to "self" to be a weak reference:

    __weak typeof(self) weakSelf = self;
    
    void (^ loadThumbnailBlock)(NSIndexPath *indexPath) = ^ (NSIndexPath *indexPath) {
        Page *page = [weakSelf.fetchedResultsController objectAtIndexPath:indexPath];
        [weakSelf loadThumbnailForPage:page forIndexPath:indexPath];
    };
    
So we eventually proved that the view controller was always being released, dealloc'd, and that all of it's views were going away.  Still the problem persisted, so that was still not the root cause.  The view controller was also handling the memory pressure warning correctly.  Whenever the app received the warning that it was using too much memory, it would remove the images it wasn't displaying from memory, as well as unload any views it wasn't using.  This was only a small fraction of the memory that had already been leaked though, so this only delayed the inevitable.
 
Finally, we nailed it down though.  We knew that the image contexts were being leaked both from checking instruments and because that's the only thing that could eat up that much memory.  The images are being rendered in blocks that are kicked off by the view controller.  We knew now that the view controller wasn't being held on to, because it was being dealloc'd, and so thus it also couldn't be holding onto the blocks that were rendering the images.  So something else must be holding onto those blocks...

Enter NSOperationQueue and NSBlockOperation.  NSOperationQueue and NSBlockOperation are built around Grand Central Dispatch to provide conveniences such as the ability to "cancel" an operation.  We were using this convenience to allow the app to cancel image conversion blocks if you closed the view controller before they finished.  Makes sense right?  Well, it was actually this optimization that was killing us in terms of memory leaks.  Take a look at the block of code below:

    __block NSBlockOperation *operation = [[NSBlockOperation allocinit];
    __weak typeof(self)weakSelf = self;
 
    MMVoidBlock thumbnailOperationBlock = ^ {
        if (!operation.isCancelled) {
            workerBlock();
        }
        
        [weakSelf.thumbnailOperationList removeObjectForKey:key];
    };
 
    [operation addExecutionBlock:thumbnailOperationBlock];

Notice any problems?  In this case we are building a thumbnail operation block that in turn calls a worker block.  The workerBlock() is actually what goes off and renders the PDF into a graphics context, converts that into an image, and saves the image.  But take a look at what else it's doing.  The block has a reference to the operation.  That sounds fine, until you look at the last line of that snippet.  The block, which holds a strong reference to the operation, is then being added to the same operation.  That addExecutionBlock method is going to retain the block, so now we have ourselves a retain cycle.  The block is holding onto the operation, so the operation won't be released.  But when the operation finishes the queue is going to release the operation, so now the operation has leaked because we don't have anything that holds a reference to it.  But the operation also has a reference to the block, so the block is never going to get released.  And finally, the block has a reference to the image and graphics contexts, which will now never be released either.  All because that silly block captured the operation it was added to.

Now the plot thickens even further.  The way you would typically solve this problem is by declaring the variable you want to use in the block as being weak by using the __weak specifier.  The example above inside the view controller illustrates that method.  But in this case, we can't do that because it will fail with ARC.  If you were to change __block operation to __weak operation in the example above, the operation would be released immediately.  So the performance optimizations of ARC bite us badly here too, because the operation will be nil in the last line of this function, which will cause the app to not work at all.  The compiler knows this and will actually warn you not to use __weak there.  In this case, what the compiler is telling you to do is actually the wrong thing to do, which is why this problem is so hard to solve.  By being a good sport and doing what the compiler tells you, you are lulled into a false sense of security that you will not have a retain cycle.
 
Here is the actual solution below.
 
    __block NSBlockOperation *operation = [[NSBlockOperation allocinit];
    __weak typeof(self)weakSelf = self;
    __weak typeof(operation)weakOp = operation;
 
    MMVoidBlock thumbnailOperationBlock = ^ {
        if (!weakOp.isCancelled) {
            workerBlock();
        }
        
        [weakSelf.thumbnailOperationList removeObjectForKey:key];
    };

    [operation addExecutionBlock:thumbnailOperationBlock];
 
What we had to do is just make the operation referenced by the block into a weak reference.  That way, the block isn't holding onto the operation, it just has a weak reference to it.  It's worth pointing out that pre-ARC, __block was sufficient to prevent a retain cycle.  But with ARC, __block no longer carries the same meaning and __weak must be used to prevent a retain cycle.  When the operation is finished and gets released by the operation queue, it will go away.  That will then release the block, which releases the image and graphics contexts just like we see it's supposed to in the sample above.  That resulted in dramatically improved memory performance and completely normal memory behavior.  Literally just that 2 line fix there.
 
It is worth pointing out that, in some ways, this represents a possible bug (or questionable behavior) on the part of NSBlockOperation.  What appears to be happening is that when you add an execution block, it retains it.  So far so good.  But when the block executes, it does not release the block, which would have actually broken the cycle.  Instead, the block won't be dropped until the operation is dealloc'd because, for whatever reason, NSBlockOperation does not release its associated execution blocks until dealloc gets called.  What the reason for that is, I do not know, but that's just the icing on the cake for what is already a tremendously ridiculous issue.
 
There is always a risk when dealing with scarce resources and expensive operations if you are not careful about how you use them.  Retain cycles are preventable, but it takes a lot of thought to consider how they may be caused by the code we write.  This situation represents a perfect storm in terms of memory issues on iOS.  It's easy for retain cycles to go unnoticed.  This one became so critical because we were leaking images, not strings and numbers.  Leaking images will cause an app to get killed due to memory pressure, where as strings and numbers probably never will.  Dealing with retain cycles can be extremely painful, and we should be especially careful to avoid them when dealing with expensive objects like Images and other graphical assets.  In this case, even with extreme care it was still nearly impossible to avoid a cycle.  This was a really tough problem to solve and I hope you all will remember it and avoid it in the future.