New Google Cloud Storage Buckets File System

 

Google Storage Buckets

Applications that are file-oriented and data-intensive are among those with the quickest growth rates on cloud storage workloads. Nevertheless, these workloads frequently require folder semantics that aren’t well-suited to the “flat” layout of the current buckets.

They have introduced a new bucket formation option called hierarchical namespace (HNS) for Cloud Storage, which offers optimizations for operations, resources, and folder structure. HNS, which is now in preview, can improve your  Cloud Storage buckets‘ consistency, performance, and manageability.

Why bucket structure is important

All objects in existing  Cloud Storage buckets are stored in a single logical tier of hierarchy within a flat namespace. Although “/” prefixes are used to simulate folders in the UI and CLI, these folders are not supported by Cloud Storage resources and cannot be directly accessed via an API.

Applications like Hadoop/Spark analytics and AI/ML workloads that depend on file-oriented semantics may experience problems with consistency and performance as a result. A hierarchical namespace arranges the bucket into a “tree”-like structure with folders that can hold other folders and objects, much like a conventional file system.

Suppose you wish to change the path of a folder in order to “move” it. That action is typically quick and atomic in a traditional file system, which means that if it goes well, all of the contents in the folder will have their paths renamed, or if it goes wrong, all of the contents will retain their original path name.

On the other hand, every object beneath the simulated folder in an existent  Cloud Storage buckets needs to be duplicated and removed one at a time. This is inefficient and slow if your folder has thousands or even hundreds of objects in it.

It’s also not atomic; if something goes wrong in the middle of the operation, your bucket can end up half-completed, with the folder in two locations and only some of the objects moved. For data-intensive applications that routinely rename hundreds or thousands of huge directories programmatically, this can be extremely uncomfortable.

An API supports storage folder resources in a bucket with a hierarchical namespace, and a new operation called “Rename Folder” recursively renames a folder and its contents as a metadata-only operation. Compared to conventional  Cloud Storage buckets, this guarantees an atomic and fast process, improving consistency and performance for folder-related tasks.

Cloud Storage buckets advantages

Enhanced performance: Higher initial  Cloud Storage buckets queries per second (QPS) are delivered by HNS buckets due to their optimized storage arrangement. The Cloud Storage request rate rules state that for already-existing buckets, 1000 object write QPS and 5000 object read QPS are required. By delivering up to 8 times more initial bucket requests per second for object read/write operations, HNS buckets facilitate faster scalability of your data-intensive workloads.

File-oriented enhancements: HNS offers several new APIs targeted at applications that are best suited for file-oriented storage, like workloads involving AI/ML or Hadoop ecosystem infrastructure.

Cloud bucket storage

For these workloads, the following modifications enhance performance, resilience, and convenience:

A brand-new resource (folder) with its own unique management API

(‘CreateFolder’/’DeleteFolder’/’GetFolder’), serving as a container for objects and other folders

A brand-new “RenameFolder” API that modifies the folder’s path and all of its subdirectories and objects in an atomic manner.

A brand-new “ListFolders” API that provides a list of every folder within the bucket or beneath a designated folder. Listing prefixes as folders in a flat bucket requires making numerous “ListObjects” API calls to list every object at every level of the hierarchy.

The capacity to use already-existing managed folders to “attach” them to a folder and offer fine-grained IAM security. The managed folder relocates with a renamed folder, guaranteeing that the IAM permissions do too.

Platform support: The majority of  Cloud Storage capabilities and the current Cloud Storage object APIs are supported by HNS buckets. Additionally, HNS buckets are connected with  Cloud Storage FUSE to enable file system-like bucket access via clients, and the Cloud Storage connector for Hadoop/Spark workloads (including Dataproc services).

To take advantage of these advantages with objects that have already been provisioned in Cloud Storage, you can use Storage Transfer Service to transfer data to an HNS bucket since HNS is only enabled during bucket formation.

Important usage cases

When employing applications that require file system-like hierarchy and semantics, you should think about allowing hierarchical namespaces for your bucket. As examples, consider:

The conventional expectation for Hadoop-based processing, such as Hadoop, Spark, and Hive workloads, is a file system storage structure and time-based file partitioning. For Hadoop applications, HNS connects with the  Cloud Storage connection to offer improved throughput and atomic folder renaming for several data processing pipelines.

Workload processing that is file-oriented, such as high performance computing or batch analytics, is frequently divided into folders that hold a large number of files. HNS can assist with managing folders and make quick and easy folder renaming processes possible.

Tools for processing  AI and ML, such as PyTorch, TensorFlow, Pandas, and JAX, frequently require file-like semantics. For use cases like ML model iteration, using HNS in conjunction with  Cloud Storage FUSE for client-level file system access can improve performance and reliability.

What is cloud storage bucket

Although hierarchical namespace in  Cloud Storage has several advantages for some applications, you should weigh the trade-offs for your environment. Object versioning, bucket locks, retention locks, and object ACLs are among the Cloud Storage capabilities that HNS does not support and must be activated when creating the bucket.

Particularly during the HNS public preview, there are a few important things to remember. Until it is GA, the capability is meant for non-production workloads and does not enable autoclass or soft delete features at this time. You can read more about the advantages of soft delete and how to disable it on your buckets here. As of right now, the only ways to access HNS are through the Cloud Storage Connector for Hadoop/Spark workloads,  Cloud Storage FUSE, supported client libraries, and the CLI. Throughout the preview, UI support is scheduled for later.

In the public preview, HNS is not subject to any additional fees. At GA,  Cloud Storage buckets that have hierarchical namespace enabled and folder-related activities will incur additional fees.

By setting up a Cloud Storage buckets with hierarchical namespace enabled, you may begin using it right away and explore its features using the previously mentioned supporting interfaces. View more on the HNS documentation page, please.

Post a Comment

0 Comments