Data Centre Storage: SSD vs HDD


 

Data center storage solutions

Data Center Storage

The well-known phrase “fish are friends, not food” may be found in the Disney animated classic “Finding Nemo.” SSD suppliers often consider HDDs to be food rather than pals. For laptops, it was accurate. SSD and HDD have a lot of chances to be “friends” when it comes to data center storage.

Cloud computing, artificial intelligence, big data analytics, and other data-intensive applications are putting more and more pressure on data centers to satisfy their needs. These applications demand from the storage infrastructure great performance, scalability, dependability, and cost-effectiveness. The speed and storage demands of these workloads are, however, beyond the capabilities of conventional hard disk drives (HDDs), while solid state drives (SSDs) remain too costly to completely replace HDDs. In order to have the best of both worlds, how can data centers get beyond this obstacle?

Using SSD and HDD technologies’ complementing advantages is one way to find a solution. The principles of performance tiering and data caching are well-known in business storage systems. It has long been the practice to aggregate writes in fast, nonvolatile write buffers before striping to slower storage media. In most big storage systems, data tiering which provides increased performance at higher costs is standard.

HDDs with different RPMs and disk sizes used to determine the tiers. SSDs are being used at high performance levels. For example, IDC says that in CY2023, the amount of bits sold in hybrid storage arrays (HDD + SSD) for OEM corporate storage systems surpassed that of HDD-only or all flash arrays (AFA) by over two times (although AFA had the greatest year-over-year growth rate).

Data storage in data center performance scaling

Data throughput divided by capacity often given as MB/s / TB is the most crucial performance statistic for storage systems. Storage devices must scale up in MB/s of bandwidth as their capacity increases. The overall performance of the system will suffer if they don’t. A storage system’s needed performance varies depending on the workload and hardware architecture of the system. The needed performance (MB/s / TB) may range from around 2.5 for large BLOB object stores, to about 5.0 for big data analytics, and to about 20 for GPU clusters training AI models for certain common large data center tasks.

The physics and mechanics of HDDs determine how much bandwidth they can have. Limiting data access to very large (≥ 8MB) consecutive data chunks may help boost the HDD’s overall throughput. This way, data transfer time is maximized relative to the time it takes to find the recording head and reach the data destination. The increases are restricted to the HDD’s maximum sequential bandwidth.

In most cases, write buffering on the input is needed to aggregate the variously sized host workloads into big chunks, and data caching is needed for host reads that are bandwidth- or latency-constrained. A different strategy for HDDs would be to overprovision capacity, resulting in “dark capacity.” The cost of HDDs, servers, infrastructure, and data center power needs increase with this method of sustaining performance levels.

SSD and HDD Interactions

Even while SSDs cost more, they perform much better and are the best storage option for these caches to bridge the gap between the HDD storage subsystem and host application needs. In order to get substantial increases in HDD throughput, data centers must include intelligent data management software and algorithms that have the ability to dynamically and autonomously distribute data around data tiers and caches according to access patterns, business priorities, and data characteristics.

Not every data center has the degree of technological complexity needed to fine-tune and optimize the management software. When compared to both hybrid and HDD-only storage arrays, the main factor driving the rapid growth rate of All Flash Arrays is the ease of deployment just one layer of storage may satisfy all performance requirements.

For HDD sizes up to 20 TB, these caching and tiering techniques have been effective in maximizing the throughput of the HDDs. However, they are not suitable for the most demanding applications. Meeting workload performance requirements will become more challenging when HDD capacity increase to 30 or 40 TB in the future. Consider a scenario where a 40TB HDD is required for a large data analytics application that requires 5 MB/s / TB.

The HDD would need to have a bandwidth of 200 MB/s. Even in the most modern data centers, it is a huge difficulty to guarantee that a 3.5″ 7200 RPM HDD can maintain that speed from the outermost to the innermost regions of the disk continuously. Because of this, the HDD storage systems will be even more dependent on the SSD performance benefits due to the relative size and complexity of the data management of the SSD caches in those systems having to rise.

Keeping costs and performance in check

Data centers can address the varied and dynamic demands of data-intensive applications while maintaining an ideal balance in storage costs by seeing SSDs and HDDs as “friends.” As HDDs increase in capacity, that connection will change over time.

The solid state drives

  • Give apps that need latency-sensitive data and hot data quick access.
  • Able to serve as a layer or cache to boost the data transmission speed from HDDs,
  • Whereas HDDs:
  • Give the storage that is more economical ($/TB),
  • Possess better retention qualities for cold data storage uses, such backup or archive data.

Longer-term Prospects

The capacity of huge HDDs will eventually reach a limit, even though they will still be the most economical storage option for the majority of data center applications. In due course, SSD-only storage could end up being less expensive than HDD-based options for all except cold storage uses.

New Source: Data Center Storage

Post a Comment

0 Comments