So You Want to Learn ZFS Part One

Enterprise Storage is currently broken. IT departments big and small spend way too much money on completely ineffective storage solutions which only partially fulfil the companies storage need. All the while having to justify massive expenditures to the business which do not meet the need. Essentially IT departments everywhere have become storage sales reps, sure we work with vendors who provide us with quotes, but when the rubber meets the road it is the IT department which is selling the storage to the business. If any of you have ever been in this position as I have and literally felt sick to your stomach because you knew it was all wrong, then this article is for you. This article does not provide an Enterprise Storage cure-all. What it does provide is an interesting look at a solution which really revolutionises the entire Enterprise Storage model. More importantly it allows you the flexibility, features, and densities needed to make it far superior to any modern day Enterprise Storage solution. Also it is pretty darn cheap.

If you have read any of my articles you will know that I generally provide very specific articles which address the needs of a particular niche (read: my needs – which surprisingly enough coincide with many others needs as well) this is not the case with this current series. These articles are really a plea to get you to think about why storage is purchased and provisioned in the ways that it is currently. Here is how I plan on breaking the series out.

Part One – Basic Overview of ZFS and its features.
Part Two – Big Differences between Solaris and Linux, and kind of the basics of what you need to know.
Part Three – Setting up your first test ZFS machine

Now once we have gone through these I will also do a series of articles on how to perform specific ZFS tasks, such as enabling features such as compression, de-duplication, and encryption. But this three part series is simply to get you to acknowledge the problem which none of us want to acknowledge.

Also to be fair, I am not a fan of Solaris, and I am also not a fan of Oracle. However I am a fan of ZFS. If ZFS were available in Linux or Windows this article would be about how to use ZFS there, however it is what it is and regardless of the shortcomings (usability) of Solaris, to get ZFS is worth the inconvenience.

What is ZFS?

ZFS which originally stood for Zettabyte File System, but now has evolved into more of a standalone trademark, is at a very basic level is a file system and a volume manager. However to be honest it really cannot be defined that simply. ZFS is your disk storage, end-to-end. With ZFS you can take advantage of very advanced features, copy on write, snapshots, clones, de-duplication, encryption, caching, and end-to-end checksums.

Data Integrity

ZFS solves silent data corruption. Enough said? Probably not. Silent data corruption has been a largely ignored problem by the IT community as a whole. Basically with small and slow disks statistically it takes longer to reach the amount of data written in which it is expected that silent data corruption would happen than the realistic life of the drive. Now since drives are neither small nor slow, and RAID setups can make them very very fast the amount data written can reach this barrier in a relatively short amount of time. ZFS can detect silent data corruption and with the redundancy built into your pool can self heal the file system based off of the redundant disks. ZFS does this by calculating checksums end-to-end. This was the premise that ZFS was built on. And at this point indications are that they were successful. Ultimately the best resource I have found to speak to this feature has been here.

Integrated Volume Management

Part of ZFS includes a pooled storage model which makes managing multiple disks ideal. These pools can be created on files, partitions, or whole disk devices. Of course ZFS prefers exclusive access to the disk hardware so whole disk devices are the best way. The pools are able to be created with the required level of redundancy (mirror, single parity raid – raid5, double parity raid – raid 6, triple parity raid) and can include hot spares, solid state disks for the ZIL (write-cache) or the L2ARC (read-cache).

Fantastic Performance

ZFS achieves great performance using a copy-on-write transactional model. Additionally another “fact” that ZFS has changed is that software RAID 5 is slow. In reality ZFS can be and in most cases is faster than traditional hardware RAID 5. The biggest reason for this is variable stripe width, which makes every write a full stripe. With RAID 5 if a change is needed to be made then a whole new stripe(s) need to be written and the parity recalculated and written to the parity disk. With RAIDZ1 (which is the ZFS equivalent of RAID 5) if a change is made then the change is written to disk, the size of the change determines the width of the stripe, which reduces the amount of space that needs to be written. The fixed stripe width in RAID 5 is also the culprit when discussing the RAID 5 write-hole.

In addition to the variable stripe width that ZFS uses there are a number of things which you can use to speed up disk access within the system. Every ZFS file system has a ZFS Intent Log or ZIL, which lives on the same pool as the file system (read: same speed) however you can add a faster disk (read: solid-state) in order to allow synchronous writes be committed to disk as fast as possible so that more can be sent, these committed writes would then be written to the file system as i/o permits. The most common need for synchronous writes is iSCSI and NFS. On the other hand ZFS will also cache your most commonly read block in the Advanced Read Cache (ARC) which is basically all of the free memory in the system, so you can get really fast reads by having extra memory in the system, the next level of this is the Level 2 Advanced Read Cache (L2ARC) this is basically a fast disk (read: solid-state) which can hold even more cached items than memory, so for a relatively small amount of money you could have a machine with 64GB or RAM and 200GB of SSD L2ARC and end up with 250GB~ of read cache which will speed up your read performance from the zpool. You could additionally use the ZIL and the L2ARC in combination, you don’t have to pick one or the other (though they need to be separate devices – or at a minimum slices).

ZFS is Scalabe Beyond Any Other File System

ZFS is 128-bit File System so it can address far more than other file systems.

Any zpool can be 256 Zettabytes.
Each zpool can have 2^64 physical disks in it.
Each system can have 2^64 zpools in it.
Each zpool can have 2^64 file systems.
A single file can be 16 Exabytes.

As you can clearly see this is far more scalable then current generation file systems, ultimately these are theoretical limits and may never be reached, but it was only a couple of years ago that 40GB hard drives was the standard on a home PC, now with 3TB drives on the market, who knows how long it will be before that is the norm, and then when that is unreasonably small (as most of us would like at a 40GB now with the same disdain).

Deduplication

This has been a buzzword around the storage community for quite a few years, and frankly it is not a reason in most cases to buy storage. Additionally any sort of storage which has included de-duplication has been prohibitively expensive. ZFS includes it. Now it is expensive in one way. It requires resources, if you are planning on using de-duplication you will want a large ARC and L2ARC (if applicable). De-duplicated data will perform very poorly if your de-duplication tables are stored outside of ARC or L2ARC. That said make sure you budget for as much RAM as you can afford and plan on having some SSD’s to augment the performance should your data grow. Bottom line de-duplication is awesome, if properly implemented.

Encryption

ZFS will also allow you to encrypt the blocks that your data lives on. Frankly I don’t understand enough about ZFS encryption to explain why you should use it, however if you have a need for encryption you know it, and if you can get everything that I have described above with encryption then it ought to be worth a little investigation.

Snapshots

Really snapshots are snapshots and compared to most enterprise snapshot technologies I don’t see a large difference. The key thing to notice is that if you have a snapshot solution which is not copy-on-write then ZFS snapshots are far superior (an example of non-copy-on-write would be the utility rsnapshot). The other cool feature with regards to snapshots is that you can destroy snapshots which have other snapshots dependent on them, and the referenced data in that snapshot simply gets merged into the later snapshot (since the later snapshot requires that data to be present). So for example if you had an few snapshots (A, B, and C) and you deleted B then you really aren’t deleting it, you are merging it with C since C is dependent on B.

I could go on and on about why ZFS is great, but ultimately you don’t care. By now you are already identifying a machine to play with…

Stay tuned for the next article which will discuss the major differences and pain points of using Solaris that you need to know before you start to play with Solaris to prevent frustration and ultimately failure.