Mass Storage

Being a photographer has always meant that you need a lot of storage. I still have boxes and boxes of photos and negatives sitting in a room in my parents house. But in the digital age it means you need hard drive space, and lots of it. And it's been the case for the entire digital age too. Even when a large image size was 3MP hard drives were barely large enough to hold a complete library. But it's getting to the point now where camera technology is outpacing storage technology. The RAW files that my Canon 7D produces are close to 30MB, which has resulted in my storage system being overwhelmed. Several months ago I started the search for alternatives and here is what I learned.

My current storage system is almost as simple as you can get. I use two 2TB Hitachi Deskstar drives in a RAID-1 (mirror) format. The mirror RAID gives me slightly improves read performance, but it also gives me redundancy. It's important to note that RAID is NOT a backup. The other part of my system is a rotating off site backup strategy. Once a month I clone the drive to a copy I take to an off site location. Any new system I consider will have to support a similar backup strategy.

There's a few advantages to this system. The first is that it's simple. It takes less than a minute to open Disk Utility and turn two drives into a RAID mirror. Literally anyone can do it. The second is performance. For disk performance, SATA is still king. You can get upwards of 140 MB/s of read/write access to good hard disk drives over SATA. Compare that to 20 MB/s for USB 2.0, 60 MB/s for FW800, and 75 MB/s for iSCSI. It really is no contest, and with huge RAW files that performance really does matter. And finally, there's redundancy. A mirror RAID can tolerate the failure of a single drive and it will still be both accessible and performant without that drive.

There's disadvantages too. For starters, your volume capacity is limited to the size of the largest hard drive available. When I bought my Mac Pro two years ago, that was 2TB. There's also no easy way to upgrade the size of the array without rebuilding it, which means its not very future proof. Finally, mirror RAID provides no benefit at all for write performance.

So as my RAID volume filled up I researched the following options for increasing my storage capacity and set out to make a decision before I ran out of space.


1) Drobo
The storage robot has been hailed as the holy grail of storage for years. Drobo aims to take the complexity out of managing RAID systems by essentially eliminating the management part. It's as simple as plugging drives into the box and watching your available storage grow. It's not all magic of course. Drobo is basically a modified version of RAID-5, a parity striping RAID format where information necessary to reconstruct data on a disk is stored across the rest of the disks. That means Drobo can tolerate the failure of 1, or possibly even 2 drives (at the cost of total capacity).

It's not all warm and fuzzy though, Drobo benchmarks are very poor. Even iSCSI and eSATA Drobos max out under 70 MB/s, and others are closer to 15-20. There's also some risk in terms of device failure. If a single hard disk fails you are fairly well protected, but if the Drobo itself ails then you must buy a new one to see your data again. Not only that, but you need a Drobo with the same firmware version as your previous one. In 4-5 years, that may be hard to find. For these reasons it's pretty difficult for me to consider Drobo for primary storage. It is definitely a good solution for backup though, albeit an expensive one at $600+ without drives.

Cost: High
Performance: Poor
Simplicity: Great


2) RAID-5 NAS/eSATA/iSCSI Enclosure
These enclosures are available from tons of companies. They are typically 5 hard drive bays coupled with an Intel Atom processor which powers a Linux Software RAID system. The big benefit from these systems is really the networking they give you. For a large home or small office they provide a very cost effective way to setup a file server. The boxes cost anywhere from $400-$1000 depending on interaces, quality, and features.

There are still problems though. It turns out that software RAID-5 isn't going to win any speed awards. These systems tend to be faster than Drobos, but still slower than single, striped, or mirrored drives. They also don't have a very good reputation for quality. There's lots of stories out there on the Internet about failures, especially about drive failures since some of them aren't well ventilated. Then there's the non-starter: the write hole problem. This is an issue on lower end RAID systems that can result in undetected loss of data. Couple that with having to manage RAID software and this was the first system to get crossed off my list.

Cost: Medium
Performance: Poor
Simplicity: Average


3) DAS/SAS (Hardware RAID)
A direct attached storage system composed of SAS (Serial attached SCSI) or SATA drives offers best of breed performance. A DAS/SAN can easily break 600 MB/s, a speed only recently achieved by SSDs. The key here is that at the heart of this system lies a hardware RAID controller operating on the very fast PCI Express bus of a Mac Pro. One of the advantages to having a Mac Pro is that you can actually consider a system like this. In addition to incredible performance, that hardware RAID controller also makes it possible to solve the write hole problem! I spent several months researching this system and I narrowed it down to a few components:

Areca 1880-series RAID card
Sans Digital SAS Expander Enclosure

But all that performance and reliability comes at a price. Good hardware RAID cards are over $700, and SAS enclosures are at least $1,000 (no drives). But there's more. Hardware RAID systems also require enterprise class drives, which are usually twice as expensive as normal ones. You can use non-enterprise drives, but timing issues mean that there's a huge risk of ruining your data. If you buy into this system you really have to go all-in on the best drives you can get.

Hardware RAID is also damned complicated to set up. It took every bit of my nerd IQ and CS background to understand the basics here, and I still feel like a novice after months of research. I learned a lesson last year when my Linux media server started to sputter out: the last thing I want to be doing is scouring low level system forums while the safety of my data hangs in the balance. Despite my enthusiasm at the high performance and expansion that this system offers, the complexity involved in setting up and maintaining it coupled with high costs ultimately made me very skeptical of investing in it.

Cost: Very High
Performance: Excellent
Simplicity: Poor


Backup
The idea with all of these systems is that they combine multiple drives into a single huge volume. This gives you the flexibility to store literally everything on one device, but also to simply add more drives when you need more space later. This sounds wonderful of course but it's a huge challenge for backup solutions. RAID isn't backup, remember? Once the space of the volume exceeds that of a large single disk drive then your options for backup start to evaporate. Once we get into this realm of storage time machine is no longer an option so we typically turn to cloning and online backup. I use Backblaze for online backup and I am very happy with it. But I still view it as a cataclysmic disaster recovery tool, not a primary backup. The largest option for hard drive recovery from Backblaze is 1TB, which is half my photos, and it would take literally months to download my photos. A reliable local backup is still required. But with the above three solutions my only option for local backup is a second RAID enclosure to clone to. That effectively adds $1000+ to any of the options above for an extra Drobo or RAID-5 enclosure with drives to use just for backup. The third option is the only good one, and that puts the total cost of that system at nearly $4,000. That's just too hard to justify.

So queue the drumroll...my new mass storage system is:


4) A larger version of my current system.
I decided to keep my same system, but to rebuild my mirror RAID array with larger drives. So this week I bought two 4TB Hitachi Deskstar drives, built a new RAID-1 mirror out of them, and copied all of my data over. A big reason for this was cost. Those two 4TB drives were a tad under $600 dollars. And backup isn't an issue either, since I only need a third bare drive to use for my backup.

But the best reason to keep using this system is simplicity. From start to finish it took less than 14 hours to completely transition from the old RAID to the new one. Most of that was SuperDuper copying the data over. The rest was fixing Aperture master file references. iTunes and most other apps reference files using a path with the drive name, so since I used the same drive name none of those apps were perturbed. But Aperture is too smart (or too stupid) for this because it associates files with the unique identifier of the drive. So even though all my masters were there, Aperture couldn't see them. Fortunately there's a tool in Aperture to fix this (see: locate masters), but fortunately this was the only issue I saw with the switchover.

Hopefully this system will buy me another 2-3 years of storage. The bet I am essentially making is that there will be a better solution to this problem in 2-3 years. That will likely be near the end of life for my MacPro, and if Apple is moving away from towers then perhaps Thunderbolt mass storage will be cost effective by that time. Or maybe there will be 8TB HDDs then, and I'll just upgrade to two of those :)

Finally, I want to give a special thanks to macperformanceguide.com for all of the wonderful articles about Mac performance and photography workflows that helped inform my decision.