Below All the Turtles

Month: June 2013

If you haven’t already read Mark’s introduction to Joyent’s new Manta service, you need to.  There are plenty of exciting elements of this service, from basics like strong consistency to the lovingly crafted data processing abstractions that allow you to bring compute to your data.  As with any large system, though, Manta’s visible interface is only a small fraction of the whole; I’d like to offer a few thoughts on the technology at the bottom of the Manta stack, the area where I’ve contributed most.

Prevailing wisdom holds that computers are a commodity, that there is no value to be found or created downstack.  For the past decade or more, the more aggressive pundits have extended this belief into the operating system and even further upstack.  As usual, they’re wrong (being wrong is, after all, the basic function of the pundit).  The creation of another layer of abstraction is one of the few fundamental tools available to software engineers, but as powerful as this tool is, the universe cannot in fact be turtles all the way down.  Every stack has a bottom and at the bottom of the stack (with apologies to particle physicists) lies hardware.  Entwined with that hardware is its evil and seemingly omnipresent companion, firmware.  Together these components provide the foundation on which all software is built.  While the ideal foundation doesn’t exist today, even within the rigid confines of today’s heavily commoditised market there is enough variety available to build foundations that are better or worse for their purpose.  All servers are not created equal.  The basic function of hardware is to provide physical resources to applications, access to which is managed by the operating system.  As such, the most basic way in which hardware configurations differ is in the balance of resources they provide.  But hardware components also have architectural differences: choices made by their designers about the division of labour between hardware and firmware, or between firmware and operating system.  Different components present different abstractions to layers above, and larger-scale hardware choices support or hinder systemic architectural objectives.

Shortly after I came to Joyent in early 2012, I began working on a plan to augment and replace our existing server fleet.  As we began discussing Manta, it became clear that the project would require servers with a different balance than we needed in our public cloud.  Like the Sun Storage 7000 systems I worked on at Fishworks, many of the systems at the heart of Manta would be storing user data.  But there’s a catch!  These same systems also execute user code; hence, bringing compute to data.  What balance among CPU cycles, DRAM size, storage capacity, and storage performance would be required by such an application? With any new model come the unknowns, and this was, and still is, a key unknown.  It will be your ideas, your use cases, that ultimately dictate how the Joyent Manta fleet is assembled.  Our best guess, based on commodity economics and our experience, is embodied in the Mantis Shrimp, a 4U server capable of storing some 73 TiB of user data (soon to be nearer 100 with the introduction of 4 TB disk drives) and sharing nearly all of its remaining components with the systems comprising our public cloud infrastructure and the more conventional components of the Manta service.  By standardising on components across our fleets, we reduce operational costs and engineering effort; at the same time, we have the flexibility to tune system balance across a broad spectrum while remaining well within the industry-wide price/performance “sweet spots”.  Joyent provides unmatched transparency in our server and component selection: you can read the same BOMs and specifications our manufacturing partners work from in our repository on GitHub, you can purchase these certified systems for your own SmartDataCenter based private or hybrid clouds, and you can use basic OS tools to inspect the machine on which your software is running, whether in Manta or in the public cloud.  Manta may seem magical, but the systems at the bottom of the stack are no mystery.

As we learn from our Manta customers, we’ll be adjusting our fleet to match demand for storage and compute; a critical part of our own big data strategy is understanding the utilisation of our infrastructure and adapting to customer needs.  One of the great things about working for a systems company is being able to create and use tools at every level of the stack to collect the raw data that drives quality decisions.  Without technology like DTrace and Cloud Analytics, our view of resource consumption would be woefully inadequate to this task; this kind of innovation is technologically impossible to accomplish without downstack support.  More than once I’ve wondered how anyone can build software without the observability and debugging tools SmartOS offers.

The second set of important decisions is architectural.  Storage architectures run the gamut from heavily centralised, vertically-scaled SANs to entirely decentralised systems built entirely around local storage.  With the lessons from our Fishworks experience in mind, we’ve chosen the latter for both our cloud management stack and Manta.  Every object in Manta is stored by default on 2 ZFS storage pools, each local to the server on which it is accessed.  There are no SANs, no NAS heads, and no hardware RAID controllers.  ZFS, while not visible to Manta consumers, is nevertheless providing both the crucial proven reliability essential to any storage product and the detailed observability required to diagnose and repair faults and assess future resource needs.  This architecture is not a differentiator for Manta users but it enables us to make Manta faster, cheaper, and more reliable than it would otherwise be, and — crucially to our strategy for bringing compute to data — without requiring us to erasure-encode individual objects across the fleet.  While Redmond has finally joined the durable pooled storage party, much of the world is still hamstrung by expensive SANs, opaque and unsafe hardware RAID controllers, or unreliable local storage.  We’ve been working with ZFS so long that we often take it for granted, but it’s tough to overstate how bad the alternatives are and we’re very thankful to be deploying Manta atop ZFS and basic SAS HBA technology.

So, for the second time in my career, I find myself at the bottom of the stack, focused on technologies that are at once utterly essential and entirely invisible to the end user.  As big a game-changer as Manta’s interface is, it cannot exist without a solid foundation beneath.  No one in the industry has a better foundation than Joyent; Manta offers an example of what becomes possible when that foundation not only supports but actively aids upstack software.  We hope you find it as exciting as we do.

Recent Posts


December 29, 2014
December 29, 2014
February 20, 2014
December 28, 2013
June 25, 2013