H2O.ai Cuts EBS Costs by 60% While Scaling AI Workloads on Amazon EKS
Customer Overview
H2O.ai is a leading AI and machine learning platform, trusted by more than 20,000 organizations worldwide, including over half of the Fortune 500. The company enables both customer-facing deployments and internal research, operating a highly automated, multi-cloud, zero-trust infrastructure designed for speed, security, and scale.
At the core of H2O.ai’s platform are large-scale AI and ML workloads running on Kubernetes, primarily Amazon EKS with Bottlerocket OS. Persistent data is stored on Amazon EBS, supporting fast-growing, highly dynamic workloads across dozens of AWS accounts.
The Challenge
As H2O.ai scaled its AI platform, storage became a critical bottleneck.
The company managed their dynamic EKS clusters spread across roughly 50 AWS accounts, with more than 100 clusters and thousands of EBS volumes. Spiking customer demands led to higher storage demands in parallel to internal training jobs, experiments, and new feature deployments.
Native EBS limitations made it difficult to react to these changes in real time. Capacity and performance could not be adjusted dynamically using policy-based automation, forcing the team to significantly overprovision storage to avoid outages. As a result, average EBS utilization dropped to 17 percent, making EBS the single largest source of wasted cloud spend.
At the same time, H2O.ai required a solution that would not require downtime, introduce operational risk, or demand significant changes to applications or cluster operations.
The Solution
H2O.ai adopted Datafy EBS Auto-Scaler to fully automate and optimize its EC2 storage layer and overcome native AWS limitations in scaling EBS.
Datafy EBS Auto-Scaler easily integrated with their existing Infrastructure as Code workflows. The solution operates as an autonomous storage layer that continuously adjusts EBS volume capacity and performance in real time, based on actual workload demand.
Key aspects of the deployment included:
- Native Integration: Compatible with Kubernetes and existing CI/CD toolsets.
- Zero-Touch Implementation: Enabled full deployment without modifying applications, underlying operating systems, or core cluster configurations.
- Automated Storage Management: Facilitated dynamic EBS scaling up and down with zero downtime.
- Hardened Security Alignment: Maintained strict adherence to the Bottlerocket security posture.
- Streamlined Data Protection: Leveraged CSI and Velero to ensure seamless integration with snapshot and backup processes.
All critical scaling decisions are executed locally and autonomously, ensuring high availability and reliability even at large scale.
The Results
With Datafy in place, H2O.ai transformed how it manages storage across its AI infrastructure.
Their storage utilization increased by 429%, turning a previously overprovisioned architecture into a streamlined estate.
The impact was immediate and measurable:
- Over 60% reduction in EBS costs within 30 days.
- $1M+ in annual savings.
- Thousands of EBS volumes managed dynamically across 50+ AWS accounts.
- Zero operational overhead, zero impact on cluster stability.
Datafy EBS Auto-Scaler is now an important part of H2O.ai’s solution, allowing the engineering teams to focus on AI innovation rather than infrastructure constraints.
Customer Quote
“We have highly dynamic Bottlerocket EKS clusters, spanning across approximately 50 accounts. Datafy allows us to easily deploy, activate, and auto-scale persistent volumes using Terraform and Helm, in a way that is completely seamless to our applications and to any cluster operation. We have saved over 60 percent of our EBS budget in a few simple steps, without jeopardizing cluster availability or stability.”
Senior Manager, Cloud Engineering, H2O.ai
Conclusion
By adopting Datafy, H2O.ai eliminated storage inefficiencies that were limiting scale and driving unnecessary cloud costs. The result is a fully autonomous, highly utilized storage layer that keeps pace with the demands of modern AI workloads while significantly reducing spend.
For organizations running large-scale, dynamic EC2 or Kubernetes environments on Amazon EBS, H2O.ai’s experience with Datafy demonstrates that the right storage tools can now be a strategic engine for limitless scale.