Infrastructure Pricing Finops Gpus Compute Deployment

Cloud GPU Pricing Shifts in Q1 2026

The hyperscaler price war is on — and spot markets are the real story

| 6 min read
THE SIGNAL
  • AWS dropped on-demand pricing for p5.48xlarge (8x H100 SXM) by 18%, from $98.32/hr to $80.62/hr, effective January 15. GCP and Azure followed with 15-20% reductions on equivalent instances within two weeks.
  • Spot pricing on H100 instances is now 50-65% below on-demand across all three hyperscalers, but availability windows have narrowed to 2-4 hours in peak regions (us-east-1, europe-west4) before preemption.
  • 1-year reserved (committed use) pricing has settled at 35-40% off on-demand for H100 instances — the best risk-adjusted deal in GPU compute right now.
  • Tier-2 providers (CoreWeave, Lambda, Crusoe) are pricing H100 on-demand at $2.49-2.85/GPU/hr, undercutting hyperscalers by 20-30%, but with weaker SLAs and no multi-region failover.
  • The A100 80GB generation is entering clearance pricing territory: on-demand rates down 30-40% year-over-year, making it the budget option for inference workloads that do not require H100 memory bandwidth.

What Happened

The Q1 2026 GPU pricing shakeout was inevitable. H100 supply finally caught up with demand after eighteen months of allocation constraints. TSMC’s expanded CoWoS packaging capacity, combined with NVIDIA shipping over 500,000 H100 units in Q4 2025, means the hyperscalers are no longer supply-constrained on current-generation hardware. When supply normalizes, prices fall — and in January, they fell hard.

AWS moved first on January 15, cutting p5.48xlarge on-demand rates by 18%. The move was not altruistic. Google Cloud had been quietly offering aggressive committed-use discounts to large accounts since November 2025, pulling workloads off AWS. Within ten days, GCP formalized the cuts across their a3-highgpu-8g (H100) instance family, and Azure followed with reductions on ND H100 v5 series. The net result: on-demand H100 pricing converged to roughly $8.00-10.10 per GPU-hour across all three providers, down from $10.50-12.30 in Q4 2025.

But the on-demand headline numbers are not where the real action is. Spot markets and reserved commitments are where the economics get interesting — and where most teams are leaving money on the table.

COST

The gap between on-demand and optimized procurement (reserved + spot mix) is now $3.50-4.80 per GPU-hour on H100 instances. For a team running 64 GPUs continuously, that is $170,000-245,000 per month in avoidable spend. Procurement strategy is no longer a finance concern — it is an engineering architecture decision.

Spot Markets: High Reward, Higher Variance

Spot pricing has always been the cheapest way to access GPU compute, but Q1 2026 introduced a new dynamic: volatility. As more teams adopt spot-aware orchestration frameworks (SkyPilot, Skypilot, Kubernetes Karpenter with GPU-aware provisioning), the competition for spot capacity has intensified. Prices now swing 30-40% within a single day in popular regions.

The numbers are compelling when you can get them. AWS p5.48xlarge spot instances have traded as low as $28.50/hr (71% off on-demand) during off-peak windows, and the weekly median sits around $33-36/hr (55-59% off). GCP and Azure spot markets show similar patterns, though with less liquidity and faster preemption — average instance lifetime before reclamation is 2.1 hours on GCP versus 3.4 hours on AWS for H100 instances.

The practical implication: spot is excellent for fault-tolerant workloads — fine-tuning with checkpoint-resume, batch inference, offline evaluation suites — but it is not a viable primary strategy for real-time serving. Teams that built their entire inference stack on spot during the 2024-2025 scarcity (when spot prices were paradoxically high and stable) are now getting hit with frequent preemptions as the market normalizes.

BUILDER BREAKDOWN

Pricing Comparison: H100 Instances Across Providers (February 2026)

ProviderInstance TypeGPUsOn-Demand ($/hr)1yr Reserved ($/hr)Spot Median ($/hr)
AWSp5.48xlarge8x H100 SXM$80.62$52.40 (35% off)$33.86 (58% off)
GCPa3-highgpu-8g8x H100$78.17$47.68 (39% off)$31.27 (60% off)
AzureND H100 v58x H100$80.88$51.77 (36% off)$35.49 (56% off)
CoreWeaveH100-80GB-SXM8x H100$22.76*$16.56 (27% off)N/A
Lambdagpu_8x_h100_sxm58x H100$19.92*$15.14 (24% off)N/A

Per-GPU pricing multiplied by 8 for comparison. CoreWeave/Lambda price per GPU: $2.85 and $2.49 respectively.

Optimizing Your Procurement Mix

Baseline load (60-70% of capacity): Reserved instances. Identify your steady-state GPU utilization over the past 90 days. That floor is your reserved commitment target. At 35-40% off on-demand, the breakeven on a 1-year commitment is reached in under 4 months — anything beyond that is pure savings. GCP currently offers the deepest reserved discount (39%) on a3-highgpu-8g.

Burst and batch (15-25% of capacity): Spot instances. Configure your orchestrator for multi-region spot with automatic failover. Key settings for Karpenter:

# Karpenter NodePool for GPU spot instances
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu-spot-h100
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["p5.48xlarge"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: gpu-spot
  disruption:
    consolidationPolicy: WhenEmpty
    expireAfter: 4h
  limits:
    cpu: "768"
    nvidia.com/gpu: "64"

Emergency headroom (10-20% of capacity): On-demand. Keep a small on-demand allocation for traffic spikes and spot preemption recovery. This is your insurance policy. Over-provisioning here by 5% is cheaper than the latency spike from scrambling for capacity during a preemption cascade.

A100 as budget tier. If your models fit in 80GB and you do not need the H100’s higher memory bandwidth (3.35 TB/s vs 2.0 TB/s), A100 instances are now priced at $4.30-5.10/GPU/hr on-demand — 40-50% cheaper than H100. For inference on models under 30B parameters, the cost-per-token difference is marginal.

ECONOMIC LAYER

Winners and Losers

Winners:

  • Mid-scale inference operators running 100-500 GPUs. The 15-20% on-demand cut plus optimized procurement can reduce their annual GPU bill by $1-4M without any migration or architectural changes. This is pure operational leverage.
  • Spot-native ML platforms (Anyscale, Modal, SkyPilot users). The widening gap between spot and on-demand rewards teams that invested in preemption-tolerant infrastructure. Their effective cost basis just dropped to $4.00-4.50/GPU/hr.
  • A100 holdouts. Teams that delayed the H100 migration are now validated. A100 clearance pricing makes the 80GB SKU the best price-performance option for inference workloads under 30B parameters. No migration needed.

Losers:

  • Teams locked into H100 reserved instances from mid-2025. Those contracts were priced at Q3 2025 rates — 20-30% above current market. Depending on the provider, there may be limited options to renegotiate. Azure allows reserved instance exchanges; AWS does not for Savings Plans already purchased.
  • Tier-2 providers competing purely on price. CoreWeave and Lambda’s value proposition was “H100s cheaper than hyperscalers.” With the hyperscaler price cuts, the gap has narrowed from 40% to 15-20%. Their remaining edge is availability and bare-metal access, not cost.
  • On-demand-only teams. Any organization still running production GPU workloads entirely on on-demand pricing is now paying a 35-60% premium over an optimized mix. At scale, this is the difference between a viable unit economics model and one that does not close.

“GPU pricing is no longer a supply story — it is a procurement strategy story. The difference between the best and worst buyer of the same H100 hour is now 3x. That gap is wider than the performance difference between an A100 and an H100.”

RISK

Spot preemption rates on H100 instances have increased 2.4x since November 2025. If your spot-based training runs do not checkpoint at least every 20 minutes, you are statistically likely to lose work within a single session. Set checkpoint_interval aggressively and test your resume-from-checkpoint path under real preemption conditions — not just simulated failures.

INSIGHT

Watch for GCP’s new Flex CUD (Committed Use Discount) option, launched in January 2026. Unlike standard 1-year or 3-year commitments, Flex CUDs allow monthly re-allocation across GPU instance families. The discount is shallower (20-25% vs 35-40%), but the flexibility to shift between A100 and H100 — or between regions — makes it a strong hedge if you are uncertain about your workload mix over the next year.

WHAT I WOULD DO

Recommendations by Role

CTO: Audit your current GPU procurement mix this week. If more than 30% of your GPU spend is on-demand, you are overpaying by six figures annually. Set a target of 60% reserved, 25% spot, 15% on-demand. Assign your infra lead to model the 1-year reserved commitment against your trailing 90-day utilization baseline. For hyperscaler selection, GCP currently offers the best reserved pricing on H100; AWS has the deepest and most liquid spot market.

Founder: GPU cost should not be a black box in your financial model. Ask your technical team for the blended effective rate per GPU-hour — not just the list price. If your team is paying the on-demand rate because “reserved is complicated,” that is an engineering leadership gap, not a technical constraint. The procurement optimization described here requires zero code changes to your application — it is purely an infrastructure configuration exercise.

Infra Lead: Implement a three-tier procurement strategy this quarter. First, lock in 1-year reserved instances covering your P50 utilization (the level you exceed 50% of the time). Second, deploy Karpenter or SkyPilot for spot-based batch and training workloads with multi-region failover across at least three availability zones. Third, maintain a small on-demand pool sized to handle your P95 burst minus your reserved capacity. Run a cost simulation against the last 90 days of actual usage to validate the mix before committing. If you are on AWS, evaluate whether Savings Plans (compute-flexible) or Reserved Instances (instance-specific) better match your workload patterns — Savings Plans offer more flexibility but slightly shallower discounts.

SOURCES & NOTES
  1. “Amazon EC2 P5 Instance Pricing Update — January 2026,” AWS Pricing Page, aws.amazon.com/ec2/pricing/on-demand
  2. “GPU Instance Committed Use Discounts and Flex CUDs,” Google Cloud Blog, cloud.google.com/blog/products/compute (January 2026)
  3. “Azure ND H100 v5 Series Pricing Adjustments — Q1 2026,” Microsoft Azure Pricing, azure.microsoft.com/pricing/details/virtual-machines
  4. “Spot Instance Preemption Rates and Availability Trends,” The Cloud GPU Price Index, Vantage.sh (February 2026)
  5. “H100 Supply Normalization and Market Impact,” SemiAnalysis GPU Market Report, semianalysis.com (January 2026)

NEXT UP

Stay in the loop

Infrastructure intelligence, delivered weekly. No hype, just actionable analysis.