banner

News

Jul 21, 2023

VAST Data Lives Up To Its Name With New VAST Data Platform

VAST DataPlatform

Do traditional storage architectures need to be updated? After all, nearly every enterprise storage system that's available today is fundamentally based on an architecture that is now decades old – true of even the most modern all-flash storage systems. The industry continues to rely on a storage architecture that was invented to service the needs of a compute landscape that looked very different from the one we live in today. After all, the storage and data needs of a high-performance analytics or AI cluster are very different from a traditional monolithic application.

Traditional storage products focus, of course, on the business of storing data. A typical enterprise-class storage array might look at the bits flowing through the system to encrypt or decrypt data, but it doesn't attempt to interpret that data. There is no real "data intelligence" in a storage array. Interpreting and understanding the bits within a storage system is the job of a software stack that typically lives somewhere else.

The software stack for a modern analytics solution can become very complicated, involving many software components. VAST DATA shared a slide, reproduced below illustrating this complexity. Modern analytics and AI workloads could be vastly more efficient with a data store that assists in understanding the data.

Simplifying the AI Stack

VAST Data has been thinking about this problem since the company’s founding. VAST hasn’t been coy; it’s right there in the name: VAST Data. I remember the first time I spoke with Jeff Denworth, a VAST co-founder and the company's CMO when he told me that it'd be a mistake to think of VAST Data as a storage company. We're building something bigger, he told me, something that will enable (or become) an actual thinking machine.

VAST has been slowly enabling increasing levels of data awareness within its products. The company this week unveiled more elements, while teasing a future one, that proves VAST Data is true to its word. VAST is a Data company whose product is the VAST Data Platform.

The VAST Data Platform extends VAST's already-impressive universal storage capabilities with new data access and analytics features. The VAST Data Platform combines the VAST DataStore, VAST DataBase, and VAST DataSpace. The company also teased the VAST DataEngine, coming sometime next year. Let’s look at each of these elements.

The VAST DataStore is the product most associated with VAST Data. VAST can linearly scale its multi-protocol file services to exabytes of data using its unique disaggregated, shared-everything architecture. It delivers this using standard off-the-shelf server- and storage-hardware. The VAST DataStore supports the full range of features you want to see in an enterprise storage product, from data protection to security.

VAST DataStore

The first public proof of VAST’s ambitions to move up the data stack came earlier this year with its announcement of the VAST Data Catalog (and, implicitly, the VAST DataBase). The data catalog is a feature of VAST's Universal Storage that allows users to tag unstructured data with user-defined metadata into a queryable table for future analysis – essentially giving structure to unstructured data.

The VAST Data Catalog eliminates the need to perform inefficient operations that walk filesystems to build this data, allowing users and administrators to gain instant insights directly from the storage system. This is foundational to what VAST Data is delivering.

As VAST tells it, the VAST DataBase combines an exabyte-scale namespace for natural data types such as images, video, LIDAR, genomes, and other rich, real-world data with a tabular database to hold the catalog of metadata associated with that data. This metadata includes user-defined tags.

The VAST DataBase provides easy integration with nearly all the most popular data wrangling and query interfaces, including Apache Spark, Parquet, databricks, RAPIDS, and Vertica (among others). VAST also has its own SQL-based query language, and feature-rich API, for those who want to get even closer to the data. This is a nice set of enablers.

VAST DataBase Connectivity

VAST has performed significant tuning to match the needs of data analytics with the sometimes-different needs of storing and managing data. When designing the VAST DataBase, the company built its own database engine instead of leveraging an existing open-source product.

This has paid off. VAST's approach to storing columnar data has enabled the VAST DataBase to achieve remarkable levels of query filtration, reducing the number of records a query engine must sift through.

VAST illustrated the power of its VAST DataBase, comparing the same Trino query on the VAST DataBase and on a Fast S3 datastore. The query against Fast S3 returned 580M rows of data in just over 40 seconds, while the VAST DataBase returned just 2,000 rows in only 1.84 seconds. Those are stunning results. I'm anxious to see if real-world performance is as impressive.

Query Performance

One of the critical challenges of managing data across a distributed architecture is in managing locks on the data. Lock management can make-or-break a distributed system. This is something that every distributed file system vendor is relentlessly focused on. VAST is no different, solving the distributed data problem with its new VAST DataSpace architecture.

The VAST DataSpace does two things well: it implements an element-level (e.g., file, object, table) locking scheme, and it contains a unique cache integrity architecture that ensures read consistency that doesn't sacrifice performance on hot data.

As VAST Data describes it, reads can achieve peak performance while writes maintain consistency. This happens because, as a write happens, all globally cached copies of that element are removed. At the same time, any references are directed to the cluster holding the lock. There's not enough space here to do justice to what VAST has delivered, so I encourage the curious reader to walk through VAST Data’s description of the technology.

The end result of all this technology is that the VAST Data Platform enables global access from edge to the cloud using unified file and object semantics, as well as with Table APIs. VAST does this without sacrificing performance.

I talked to a technology reporter just after VAST's launch event for the VAST Data Platform. He asked me whether the VAST announcement will disrupt the legacy storage vendors. I don't think it will. There's a traditional approach to storage that's deeply embedded within nearly every tier-one storage company, and there's little motivation to change that. Seeing the world through the same lens as VAST Data requires a specific kind of vision, different from one that most legacy OEMs possess.

The reality is that traditional storage-focused offerings work just fine for most of today's enterprise workloads. Where classic storage breaks down is along the edges, where extreme performance and scalability live, and where globally distributed data may be important. This is also where the future of enterprise IT might live, where real-time analytics and data-hungry AI clusters are more prominent.

VAST Data may be ahead of the technology curve, but that's ok. VAST enables a specific future, but it allows its users to adapt at their own speed. VAST isn't charging a premium for any of its features. Customers can deploy traditional multi-protocol storage features and, as needs evolve, begin to take advantage of what the VAST DataBase and VAST DataSpace offer. As analytics and AI of all varieties permeate the enterprise, VAST is ready to take care of those workloads.

Those in the digital transformation business like to talk about how an enterprise's data impact its competitiveness. Central to the task of making data a competitive differentiator is making data queryable. That’s a complex challenge, or at least it was until VAST Data unveiled its VAST Data Platform this week. VAST Data simplifies the understanding of an enterprise’s data. It does this while offering some of the most performant and feature-rich storage technology available. That’s powerful.

Disclosure: Steve McDowell is an industry analyst, and NAND Research an industry analyst firm, that engages in, or has engaged in, research, analysis, and advisory services with many technology companies, which may include those mentioned in this article. Mr. McDowell does not hold any equity positions with any company mentioned in this article.

Data
SHARE