Case Study
H2O.ai
How H2O.ai Optimized Storage Costs While Scaling AI on Amazon EKS
H2O.ai Cuts EBS Costs by 60% While Scaling AI Workloads on Amazon EKS
- Over 60% reduction in EBS costs within 30 days.
- Thousands of EBS volumes managed dynamically across 50+ AWS accounts.
- Zero operational overhead, zero impact on cluster stability.
Company
H2O.ai is a leading AI and machine learning platform, trusted by more than 20,000 organizations worldwide, including over half of the Fortune 500. The company enables both customer-facing deployments and internal research, operating a highly automated, multi-cloud, zero-trust infrastructure designed for speed, security, and scale.
Infrastructure
At the core of H2O.ai’s platform are large-scale AI and ML workloads running on Kubernetes, primarily Amazon EKS with Bottlerocket OS. Persistent data is stored on Amazon EBS, supporting fast-growing, highly dynamic workloads across dozens of AWS accounts.
“We run tens of EKS clusters on Bottlerocket across multiple AWS accounts, and rolling out Datafy through our existing IaC workflows was trivial. It seamlessly auto-scales persistent volumes with no impact on applications or cluster operations.
In just a few steps, we reduced our EBS costs by over 60 percent while maintaining the availability and stability we expect in production.”
H2O.ai – Ophir Zahavi, Director, Cloud Engineering
H2O.ai Cuts EBS Costs by 60%
While Scaling AI Workloads on Amazon EKS
The Challenge
As H2O.ai scaled its AI platform, storage became a critical bottleneck.
The company managed their dynamic EKS clusters spread across roughly 50 AWS accounts, with more than 100 clusters and thousands of EBS volumes. Spiking customer demands led to higher storage demands in parallel to internal training jobs, experiments, and new feature deployments.
Native EBS limitations made it difficult to react to these changes in real time. Capacity and performance could not be adjusted dynamically using policy-based automation, forcing the team to significantly overprovision storage to avoid outages. As a result, average EBS utilization dropped to 17 percent, making EBS the single largest source of wasted cloud spend.
At the same time, H2O.ai required a solution that would not require downtime, introduce operational risk, or demand significant changes to applications or cluster operations.
The Solution
H2O.ai adopted Datafy EBS Auto-Scaler to fully automate and optimize its EC2 storage layer and overcome native AWS limitations in scaling EBS.
Datafy EBS Auto-Scaler easily integrated with their existing Infrastructure as Code workflows. The solution operates as an autonomous storage layer that continuously adjusts EBS volume capacity and performance in real time, based on actual workload demand.
Key aspects of the deployment included:
-
Native Integration
Compatible with Kubernetes and existing CI/CD toolsets. -
Zero-Touch Implementation
Enabled full deployment without modifying applications, underlying operating systems, or core cluster configurations. -
Automated Storage Management
Facilitated dynamic EBS scaling up and down with zero downtime. -
Hardened Security Alignment
Maintained strict adherence to the Bottlerocket security posture. -
Streamlined Data Protection
Leveraged CSI and Velero to ensure seamless integration with snapshot and backup processes.
The Results
With Datafy in place, H2O.ai transformed how it manages storage across its AI infrastructure.
Their storage utilization increased by 429%, turning a previously overprovisioned architecture into a streamlined estate.
The impact was immediate and measurable:
- Over 60% reduction in EBS costs within 30 days.
- $1M+ in annual savings.
- Thousands of EBS volumes managed dynamically across 50+ AWS accounts.
- Zero operational overhead, zero impact on cluster stability.
Datafy EBS Auto-Scaler is now an important part of H2O.ai’s solution, allowing the engineering teams to focus on AI innovation rather than infrastructure constraints.
Conclusion
By adopting Datafy, H2O.ai eliminated storage inefficiencies that were limiting scale and driving unnecessary cloud costs. The result is a fully autonomous, highly utilized storage layer that keeps pace with the demands of modern AI workloads while significantly reducing spend.
For organizations running large-scale, dynamic EC2 or Kubernetes environments on Amazon EBS, H2O.ai’s experience with Datafy demonstrates that the right storage tools can now be a strategic engine for limitless scale.
Secure. Compliant. Reliable.
Start reducing EBS costs with Datafy