Lenny For Your Thoughts: Archive

Warehouse Computing and the Evolution of the Datacenter: A Layman’s Guide

You may not have noticed, but we’re in the midst of another massive platform shift in enterprise computing. We can debate chicken or egg, but I believe this most recent transformation is being driven primarily by requirements placed on modern applications; requirements that are the result of the on-demand, always-on computing paradigm predicated by cloud and mobile. Simply, applications need to be scalable, available and performant enough to reach millions, if not billions, of connected devices and end-users. Infrastructure must mirror these specifications, in kind.

Historically, systems design has ebbed and flowed between periods of aggregation (centralized) and disaggregation (distributed) of compute resources. The most recent evolution, from client/server to virtualized, cloud infrastructure was driven largely by a desire to contain costs and consolidate IT around standards (x86 instruction set, Windows and Linux) form factors (first blade servers, then VMs) and physical locations (emergence of sprawling datacenters and giant cloud vendors). Now we’re seeing the pendulum swing back. Why?

A strong first principle is the notion that infrastructure is beholden to the application. Today, many applications are being built as large-scale distributed systems, composed of dozens (or even thousands) of services running across many physical and virtual machines and often across multiple datacenters. In this paradigm, virtualization – which really dealt with the problem of low physical server utilization – doesn’t make much sense. In a highly distributed, service-oriented world, VMs come with too much overhead (read more on this here). Instead of slicing and dicing compute, network and storage, the better solution becomes to aggregate all machines and present them to the application as a pool of programmable resources with hardware-agnostic software that manages isolation, resource allocation, scheduling, orchestration etc. In this world, the datacenter becomes one giant, warehouse computer controlled by a software brain.

External image

However, the fact of the matter is that building, deploying and maintaining distributed applications is a highly technical feat. It requires a rethinking of the way applications treat and interact with other applications, databases, storage and network. Moreover, it requires a new toolkit that is central to solving the coordination and orchestration challenges of running systems that span across multiple machines, datacenters and time zones. To help understand what’s taking place, let’s deconstruct this new stack and, along the way, define some other key terms. Note that this is in no way a static, absolute taxonomy, but rather a simplified way to understand the layers that make up today’s application stack.

Layer 1: Physical Infrastructure – Actual servers, switches, routers and storage arrays that occupy the datacenter. This area was dominated by legacy OEMs (EMC, Cisco, HP, IBM, Dell) who are now giving way to low-cost ‘whitebox’ ODMs.

Vendors/Products:

Layer 2: Virtualized Infrastructure – Emulated physical compute, network and storage resources that are the basis for cloud-based architectures. The enabling technology here is the hypervisor which sits on top bare metal infrastructure and creates virtual clones of the server (or switch or storage array) each complete with a full OS, memory management, device drivers, daemons, etc.

Vendors/Products:

Layer 3: Operating System – Host or guest OS that sits atop a virtual or physical host. The rise of Linux has been a key catalyst for the commoditization the OS and physical infrastructure, decoupling applications from hardware. Microsoft with Windows Server is still a dominant player in traditional enterprise.

Vendors/Products:

Layer 4: Container Engine – This is where it starts to get interesting so let’s spend a little more time here. Linux containers offer a form of operating system-level virtualization, where the kernel of the OS allows for multiple user space instances. More simply, if hypervisor-based virtualization abstracted physical resources to create multiple server clones each with their own OS, memory, etc., the type of virtualization enabled by containers is a higher level abstraction of the OS. This begets the necessary degree of isolation and resource utilization to run multiple applications on a single kernel.

The beauty of containers lies in the idea of “code once, run anywhere.” A container holds the application logic and all of its dependencies, running as an isolated process. It ultimately doesn’t matter what’s inside the container (files, frameworks, dependences); it will still execute the same way in any environment – from laptop, to testing, to production across any cloud, at least theoretically. This enables application portability, which, in turn, commoditizes cloud infrastructure altogether.

Docker has become synonymous with containerization by making Linux Containers (LXC) user-friendly. The important thing to note is that container technology is made up of two fundamental components: the runtime and the container image format. The runtime is effectively a high-level API that runs processes and manages isolation. The image format is a specification for a standard composable unit for containers. In recent months we’ve seen several container runtimes and specs come to market which has caused a stir. I’m sure we’ll continue to see more.

Vendors/Products:

Layer 5: Scheduling & Service Discovery – Tools that solve and automate the coordination challenges related to breaking up and running applications across multiple nodes and datacenters. Schedulers interface with the resources of the cluster and are responsible for providing a consistent way to intelligently place tasks based on those resources. Service discovery tools manage how processes and services in a cluster can find and talk to one another. This area is largely greenfield but the ecosystem has coalesced around a few well known projects like Mesos, etcd and Zookeeper.

Vendors/Products:

Layer 6: Orchestration & Management – Tools that automate the deployment, scaling and management of applications and infrastructure. This is what some refer to as the management plane. These tools enables devs, DevOps and sysadmins to maintain applications across clusters. This area is greenfield as new solutions are being adapted to support containers running across distributed environments. Those who win here will reap the greatest rewards. In addition to purpose-built products, there are a number of companies here who are creating application lifecycle management platforms optimized for containers including Deis, Flynn, Terminal, Tutum and many others.

Vendors/Products:

A few other helpful definitions:

Distributed System – A computing system consisting of a collection of autonomous nodes connected through a network and software/middleware which enables nodes to coordinate tasks and share resources of the entire system. The principle of distributed computing has been around for decades but only recently has it entered into mainstream IT as traditional software architecture has been pushed to its limits at Web scale. Perhaps the best known example is Apache Hadoop, an open-source data storage and processing framework where jobs are split and run across multiple commodity servers.

Microservices – Microservice architecture is a way of designing software applications as sets of modular, self-contained, deployable services. Whereas historically applications would be split into client-side, server-side/logic and database, the idea with microservices is to develop each application as a suite of smaller, modular services each running its own process with a minimal amount of centralized management. Microservices architecture is appealing because it enables greater agility (entire applications don’t need be taken down during change cycles), speed-to-market and code manageability.

(Source: http://martinfowler.com)

The application stack is an ever-evolving, dynamic organism. Ultimately, whether it’s microservices, distributed systems, or containers, the changes we’re seeing at both the code and infrastructure level are about one thing: delivering better, more scalable software faster and cheaper. As a result, today we find ourselves at the outset of, what I’ll call, the warehouse computing era, defined by cheap, commodity infrastructure presented to the application as a pool of dynamically programmable resources with intelligent, hardware-agnostic software as the control plane operating at n-scale.

To me, though, this is still an incremental step. The world of IT I envision is one where code is written once and is executed anywhere, automatically at scale, irrespective of the underlying cloud, OS, container engine, orchestrator, scheduler, etc. In this world, ops ceases to be an IT function and becomes a product within, or even feature of, the underlying stack. This world is several years away, and before we get there I can promise the tech stack is going to get a lot more convoluted before it radically simplifies. Either way, it’ll be fun to watch.

Feb 10, 201512 notes

The Most Important SaaS Metric Nobody Talks About: Time-to-Value (‘TtV’)

In a world where applications are delivered via cloud and distributed across billions of Internet-connected end-points, we’ve seen barriers to entry, adoption and innovation compress byan order of magnitude or two, if not crushed altogether. Compound this by advances in application and data portability and the implication for technology vendors competing in this global, all-you-can-eat software buffet is that customers’ switching costs are rapidly approaching zero. In this environment it’s all about the best product, with the fastest time-to-value and near zero TCO. And it’s this second point – time-to-value (TtV) – that I want to dig in on a bit because it tends to be the one glossed over most often.

I’ll start with an anecdote …

A portfolio company of ours delivers a SaaS platform that competes with legacy, on-prem offerings from large infrastructure software vendors. In its early days the company had fallen into the enterprise sales trap: spending weeks, if not months, with individual customers doing bespoke integration and support work. About a year in when we finally decided to open up the beta to everyone, sign-ups shot up, but activity in new accounts was effectively nil. What was going on?

Simply, customers didn’t know what to do with the software once in their hands. Spending months with large accounts did inform some fundamental product choices, but at the cost of self-service. Our product was feature-bloated, on-boarding flow was clunky and the integration API was neglected and poorly documented.

In a move that, I believe, ultimately saved the company, we decided to create a dedicated on-boarding automation team within product. Sure enough, in the months that followed, usage spiked and the company was off to the races.

The takeaway is that highest priority should be given to building software that just works, and that means focusing relentlessly on reducing or eliminating altogether the time investment to fully deploy your solution in production. Ideally, you want customers to derive full value from your offering in mere minutes, if not seconds. To do so, treat on-boarding as a wholesale product within your offering and devote engineering resources to it. Find religion about optimizing TtV!

Below is by no means a complete list, but instead a few lessons that I’ve taken away from my experience with our portfolio that many SaaS companies should internalize in their product and go-to-market strategies to help optimize TtV:

Simplicity wins…be feature-complete, not feature-rich: This is a fairly obvious but subtle point that often evades even the most talented product teams: the defining characteristic of a simple (read: good) product is not the abundance of features but rather the relevance of those features to its users. This stands in stark contrast to the old paradigm of CIO-inspired products that were over-engineered and feature-bloated. The challenge in the new paradigm is what might be relevant to one customer may be entirely worthless to another. The solution is focus, either in product or in market.

The always-better option in my mind is to narrow your product focus. Do one thing incredibly well. Tackle the single, most acute – but universal – pain point and ingrain yourself in your customers’ workflow…then expand horizontally. Value needs to be delivered from day one, but features can be revealed over time. Ultimately, it’s about understanding your unique unit of value and exploiting that with your customers. Slack is the single best company embodiment of the focus/simplicity paradigm. Others take note.

Hack the onboarding flow: It doesn’t matter how beautiful or utilitarian your product is if no one ever gets around to using it. Developers are a fickle bunch with seemingly infinite alternatives to your offering of paid tools, open source projects and self-built hacks. You generally have one chance with them, and that chance lasts about 20 minutes (based on conversations with some of the least ADHD-ridden devs I know).

On-boarding should emphasize and reinforce the value prop that drove the user to your product in the first place. Sign-up should be frictionless and deployment should be self-service to the point where the customer is up and running in minutes and, most importantly, getting value from your product a few moments right after. Avoid the empty room problem at all costs – even if the data or insight you provide early on has less direct customer value, it’s better than a customer looking at a blank screen.

Suggestion: read New Relic’s early blog posts. The company was fanatical about delivering value to devs from their APM solution within 5 minutes of signing up.

Documentation, Documentation, Documentation: There’s nothing sexy or glamorous about documentation, but great docs can be a source of competitive differentiation. Look no further than Stripe, whose documentation is the stuff of legends (and a big reason why the company has grown as quickly as it has). Great documentation shows devs you care and it’s increasingly becoming table stakes, particularly if your product is technical or has an upfront integration burden. Given that, take the time to document from day 1 and don’t neglect those docs as your product offering expands.

An obvious corollary to the documentation point is around API cleanliness. Your API is a not adjunct to your product, it is an extension of the core; treat it as such.

Content, Content, Content: Embrace content – it’s your opportunity to connect with users. Content doesn’t just mean static blog posts, but includes webinars, tutorials, analyst publications and reference architectures. Leverage content to showcase the integrations, use cases and features of your product. Digital Ocean does this masterfully. Just go check out their blog.

Finally, make sure to quanitfy. Set a goal for TtV and benchmark against it. TtV’s vary widely across product segments and end-markets but study your comps and make sure you’re at least beating the pants off them.

An optimized TtV has positive ramifications throughout the organization ranging from freeing up support engineers to work on product to enabling a tightening of sales and marketing spend up and down the funnel. Ultimately, a short TtV drives all those other metrics folks seem to care so much about like MRR, LTV, CAC, Churn, etc.

Feb 4, 2015