Raj Nair will be speaking at Oracle CloudWorld Sept 10th at 4pm on AI-Based Autoscaling with Avesha for Simplified OKE Management on OCI!

mobile navigation

Region

Global

Industry

Industry Cloud Computing

Company Size

10,748 (2024)

Optimizing Compute Costs and Performance with Smart Scaler

Finvi is a leading financial technology company known for its innovative revenue recovery solutions that serve clients across healthcare, accounts receivable management, and financial services. Its flagship application, a robust revenue acceleration platform, supports thousands of users and handles dynamic workloads across time zones and customer segments.

Challenges

The Akamai team needed a solution that could:

Resource Over-Provisioning: To accommodate peak usage periods, Finvi often over-provisioned compute resources, resulting in underutilization during off-peak times and increased operational expenses
Manual Scaling Limitations: The existing manual scaling processes were reactive and lacked the agility to respond promptly to sudden traffic spikes, affecting application performance and user experience.
Cost Management: Finvi’s cloud infrastructure costs continued to rise, prompting a need for a more intelligent and automated approach to scaling that could preserve performance while reducing waste.

The Solution: Smart Scaler by Avesha

To address these challenges, the company implemented Smart Scaler, an AI-augmented predictive autoscaling solution. Smart Scaler integrates seamlessly with existing cloud environments, offering intelligent scaling capabilities for both application and infrastructure resources.

Implementation Steps

Integration with Monitoring Tools: Smart Scaler was configured to work alongside Finvi’s existing Application Performance Monitoring (APM) tools, such as Prometheus and DataDog, to gather real-time metrics on application performance and traffic patterns.
Predictive Autoscaling Configuration: Utilizing machine learning algorithms, Smart Scaler analyzed historical data to forecast traffic demands accurately. This predictive capability enabled proactive scaling of compute resources, ensuring optimal performance during peak periods and cost savings during low-demand intervals.
Dynamic Resource Allocation: 
The solution facilitated automatic adjustment of both pods and nodes within the Kubernetes cluster, aligning resource allocation with anticipated workloads. This dynamic approach minimized manual intervention and reduced the risk of over-provisioning.

Results

Cost Reduction: 
By implementing Smart Scaler, Finvi achieved up to a 70% reduction in cloud infrastructure costs. The intelligent scaling mechanism ensured that resources were utilized efficiently, eliminating unnecessary expenditures associated with idle resources.
Enhanced Performance:
The predictive autoscaling feature allowed the application to maintain consistent performance levels, even during unexpected traffic surges. Users experienced improved response times, leading to higher satisfaction and retention rates.
Operational Efficiency:
Automation of the scaling process reduced the operational burden on IT teams, allowing them to focus on strategic initiatives rather than routine infrastructure management tasks.

Explanation: The chart compares traditional HPA-based autoscaling with Smart Scaler. While HPA requires high thresholds to trigger action - often leading to overprovisioning - Smart Scaler (the green shaded area on the right) maintains a tight correlation between traffic load and pod count in real time, ensuring just-in-time resource scaling and preventing waste.

Explanation: With Smart Scaler, Finvi experienced a dramatic drop in response times. Unlike HPA, which reacts to load with delays, Smart Scaler predicts load and provisions resources ahead of time, delivering a smoother, faster user experience.

Explanation: Finvi saw a 5x improvement in CPU utilization, maximizing compute resource efficiency. For this CPU-intensive workload, Smart Scaler also recommended a 50% reduction in memory requests, achieving a 2x gain in memory utilization and further reducing costs

Conclusion

Smart Scaler transformed Finvi’s approach to cloud infrastructure management. Through intelligent, predictive autoscaling and real-time optimization, Finvi:

Reduced cloud spend by up to 70%
Increased resource utilization across CPU and memory
Maintained exceptional performance under variable traffic
Freed up engineering teams from infrastructure tuning

This case demonstrates how AI-driven automation with Smart Scaler can modernize infrastructure operations for any enterprise-scale application.

Recommended Stories

OverviewInpharmD is a leading AI-driven pharmaceutical-intelligence platform that synthesizes drug data, clinical guidelines, and real-world evidence to help pharmacists, researchers, and clinicians make critical treatment decisions. Its engine continuously ingests and processes large data sets—often in batch mode—to deliver precise, timely, and customized recommendations. To guarantee top-tier performance and availability during unpredictable surges in demand, InpharmD adopted Avesha’s Elastic GPU Service (EGS) to seamlessly burst inference workloads from its primary on-premises Kubernetes cluster to Nebius Cloud—with zero code changes and full operational transparency.ChallengeInpharmD’s inference platform depends heavily on GPUs for real-time and batch processing of complex clinical queries. Running exclusively on its on-prem cluster created three bottlenecks:GPU Capacity Crunches at Peak Load Static on-prem allocations could not keep pace with unpredictable, bursty workloads. Escalating Expansion Costs Expanding on-prem GPU capacity required long procurement cycles and high capital expense, especially for overnight or ad-hoc batch jobs. Latency-Sensitive Research Workflows Any delay in inference risked slow or incomplete insights for pharmaceutical partners operating in real time. InpharmD needed a way to elastically—and intelligently—extend GPU capacity to the cloud whenever its on-prem cluster was saturated, while preserving observability and automating placement decisions.Solution: Avesha EGS GPU Bursting to NebiusWith Avesha EGS, InpharmD created a hybrid GPU workspace spanning its on-prem cluster and two Nebius regions. Each time the platform spins up a new inference endpoint, EGS automatically evaluates capacity and latency: • If on-prem has headroom, the endpoint is deployed locally for the lowest possible latency. • If on-prem is full, EGS transparently bursts the workload to the optimal Nebius region—no DevOps tickets, no redeploy, no downtime.EGS offers two complementary bursting modes:Bursting ModeHow It WorksOutcome for InpharmD1. CustomerManaged Workloads in EGS Workspace InpharmD deploys its AI services into the EGS-managed Kubernetes workspace. EGS provisions GPU nodes in Nebius as needed and schedules pods to the region with the best capacity-vs-latency profile.Instant access to Nebius GPUs without managing remote clusters or rewriting manifests.2. Model-Centric Bursting with AutoProvisioning InpharmD specifies a model (e.g., Hugging Face transformer, NVIDIA NIM, proprietary clinical-NLP). EGS finds capacity, deploys the model, and scales it automatically.Rapid R&D iterations and ondemand clinical queries—no infrastructure management required.Quote from InpharmD“With Avesha EGS, we dynamically optimize every workload—maximizing performance without overpaying for idle resources. We scale seamlessly and keep research costs predictable.” — Tulasee Rao Chintha, CTO, InpharmD— Tulasee Rao Chintha, CTO, InpharmDKey Benefits for InpharmDCost Optimization: Bursting to Nebius on demand avoids costly over-provisioning of on-prem GPUs, cutting total compute spend an estimated 25–40 %. High Availability: Multi-region Nebius failover protects service continuity during local hardware failures or maintenance windows. Full-Stack Observability: The EGS dashboard provides real-time views of GPU utilization, cost, and efficiency across on-prem and Nebius estates. Secure & Seamless: Workload placement and data movement occur over encrypted service-to-service channels; no manual intervention or exposure of internal networks.Strategic Impact By shifting from a fixed on-prem model to a hybrid GPU environment that “chases capacity,” InpharmD can: Deliver clinical insights faster during critical decision windows. Innovate rapidly with new models—without waiting for hardware buys. Align compute spend with actual demand, improving margins and budgeting accuracy.ConclusionInpharmD’s success with Avesha EGS showcases the power of intelligent GPU bursting between on-prem infrastructure and Nebius specialized GPU Cloud. With frictionless workload mobility, model-aware deployment, and deep observability, EGS gives InpharmD the agility, reliability, and cost-efficiency required to lead in AI-powered pharmaceutical intelligence.

READ CUSTOMER STORY

G&L Systemhaus, a Cologne‑based integrator with 20 years of experience powering Europe’s top live and on‑demand streams - including the European Parliament’s 24 × 7 media‑control pipeline - relies on relentless uptime and performance. This case study will show how Avesha helped G&L transform their infrastructure operations - cutting onboarding time from days to hours, slashing latency significantly, and reducing peak‑hour infrastructure spend. By eliminating the majority of network‑related support tickets and enabling real‑time, slice‑level chargeback, Avesha empowered G&L to scale faster, spend smarter, and gain unprecedented visibility into cost and performance. BackgroundTo meet new audience and cost targets, G&L is re-architecting its media workloads as a hybrid Kubernetes fabric:On-premises ARM64 K3S clusters in Düsseldorf & Frankfurt, handle live video encoding and metadata enrichment.Dual Akamai Connected Cloud (Linode Kubernetes Engine) clusters host the CMS, APIs, and orchestration logic.Success depends on secure, low-latency communication between all clusters - without re-writing applications or managing fragile VPNsChallengesFragmented networking: ad-hoc tunnels and per-service firewall rules slowed deployments and auditing.Mixed architectures: ARM64 on-prem ↔︎ x86 cloud required custom builds for every networking component.Dynamic workload peaks: live events created short, unpredictable spikes; over-provisioning wasted cores the rest of the week.Cost visibility: finance teams lacked team-based usage data to charge back clients accurately.Aggressive timeline: European Parliament go-live demanded production readiness in fewer than eight weeks.Solution: Avesha Enterprise for KubeSliceAvesha ModuleRole at G&L KubeSliceConnects on-prem & cloud clusters with a multi-tenant overlay application network; delivers crosscluster DNS and mTLS.KubeAccessSecurely bridges Kubernetes services to VM-based systems and managed databases. KubeBurst Bursts peak encoding workloads to Akamai Connected Cloud when on-prem capacity tops out. KubeTallyProvides real-time cost and usage reports per slice, team, and customer. Smart Karpenter & Smart Scaler Predictive node provisioning and pod right-sizing to tighten capacity planning.Implementation StepsARM64 package build: Avesha delivered a new KubeSlice bundle for G&L’s K3S servers within 48 hours.Slice design workshop: Joint architects mapped “cms”, “encoder”, and “monitor” namespaces across every site.Service Export / Import: enabled cross-cluster DNS without static IPs or application changes.Service Gateway: gave CMS safe access to Azure-native databases still in use.Progressive rollout: traffic migrated in gradual increments; users saw zero downtime.Customer Voice“Partnering with Avesha allows us to have our on-premises and cloud clusters securely communicate with each other. It also ensures that the same namespaces can be used in multiple environments.” - Alexander Leschinsky, Co-Founder & CEO, G&L SystemhausConclusionAvesha KubeSlice gave G&L a seamless hybrid cloud media fabric that:Unifies ARM64 and x86 clusters under one secure overlay.Bursts only the peaks, trimming cloud spend drastically.Provides slice-level (team based) observability for accurate client charge-backs.Sets the stage for AI-driven predictive autoscaling with Smart Scaler.

READ CUSTOMER STORY

GeneDX accelerates precision-medicine research with event-driven AI/ML pipelines that run on Kubernetes across Oracle Kubernetes Engine (OKE) and Azure Kubernetes Service (AKS). Two research teams share a multi-cloud platform that must scale quickly to meet tight turnaround-time (TAT) requirements for genomic analyses. BackgroundTeamCloudPre-Smart Karpenter WorkflowTeam AOKEStatic node group sized for peak.Team BAKSIdentical pattern; idle nodes during troughs, slow starts during spikes.Karpenter was not available on Oracle Cloud Infrastructure (OCI), so the OKE clusters had no concept of just-in-time nodes, also, scaling across the estate was reactive and over-provisioned.ChallengesSlow surge response: 5-minute pod queue times during traffic spikes.30 % idle cloud spend: node pools padded to avoid cold starts.Manual threshold tuning: DevOps tweaked HPA/VPA every release.SLO risk: rising TAT threatened clinical commitments.SolutionsSmart Karpenter fuses Avesha Smart Scaler with Karpenter to predict pod demand in advance and provision the exact nodes required.CapabilityImpact at GeneDXPredictive pod scaling: RL models analyze latency, RPS, and service dependenciesPods launch before a spike; queues disappear.Dynamic node provisioning: predictions drive Karpenter for right-sized nodesNo idle padding; nodes spin up/down in < 60 s.Observation → Optimize rolloutTwo-week shadow run before full AI control.Continuous learningScaling stays accurate as workloads evolve.Implementation StepsHelm install in “observe” mode; zero YAML changes.One-sprint traffic replay to train baseline models.5 % → 100 % cut-over after three clean deployments.Cost guardrails: policy caps daily node-hours; Smart Karpenter throttles non-urgent jobs when budget nears limit.ResultsKPIBeforeAfter Smart KarpenterDeltaAverage node CPU utilization48 %82 %+71 %Idle node-hours / month1 900520−73 % wasteP95 pod queue time5 m 10 s< 45 s6.8 × fasterSLO violations (job TAT)12 / month0100 % complianceCloud compute spendBaseline−33 %Savings fund new research linesCustomer Voice“Smart Karpenter makes Karpenter proactive. Nodes appear before the load hits and disappear immediately after, cutting a third of our cloud bill while keeping turnaround times rock-solid.” - Director of Genomic ML Platforms, GeneDXConclusionWith Smart Karpenter, GeneDX:Achieves predictive, hands-free autoscaling across OKE and AKS.Eliminates idle spend while boosting utilization above 80 %.Meets stringent diagnostic SLOs without manual tuning.Gains a reinforcement-learning foundation for future hybrid-cloud growth.

READ CUSTOMER STORY

Smart Solutions for Smarter Kubernetes and AI/ML Operations

Terms and Conditions

Challenges

The Solution: Smart Scaler by Avesha​

Implementation Steps​

Results

Conclusion

The Solution: Smart Scaler by Avesha

Implementation Steps