AI’s Hidden Costs and Carbon Footprint in Healthcare

When a mid‐sized hospital rolled out an LLM‐powered discharge summary tool last year, clinicians rejoiced until the next cloud bill arrived. What started as a modest pilot of 10 patients per day suddenly doubled, tripled… and then the API charges hit the finance team like a surprise second wave. It's a familiar story in healthcare systems today: LLMs hold out hope of workflow automation but bring with them volatile token‐based charging and growing energy/carbon profile that even hospital boards increasingly demand to be brought forward. One recent paper estimated that "running these AI models continuously is expensive, raising the financial obstacle to widespre.ad application"[1].
AI’s Hidden Costs and Carbon Footprint in Healthcare

The High Cost of Prediction

Tokenbilling charges by the response and by the prompt. For example, OpenAI’s GPT4 Turbo (128K) charged about $10.00 per 1M input tokens in mid-2024[2] (GPT3.5 Turbo16K was ~$1.50 per 1M). In concrete terms, if a patient note averages 1,000 tokens, then 1,000 notes cost about $10 on input alone. Toss in the model’s output tokens, and frequent use of LLMs easily runs into hundreds or thousands of dollars per month. Scale that to dozens of departments and millions of tokens per year, and you’re into six or seven figures in annual API fees. One healthcare AI analysis projected $115K–$4.6M per year in pass-through costs for GPTbased inference at an enterprise health system scale[3]. In short, a pilot that seems cheap at small volume can explode into a major line item as usage grows.

Operational complexity makes costs hard to predict. Queries vary in length and frequency, and model failures can multiply calls. Mount Sinai’s study found that smartly grouping clinical tasks could cut API calls by up to 17×, potentially saving “millions of dollars per year”[4] for a large system. But without sophisticated planning, CIOs can’t know in advance how many tokens (and dollars) a bigger rollout will consume. One clinician remembered using GPT-4 for nightly reports and then having a fivedigit monthly expense that was unforeseen. This is a change from more predictable IT expenses (fixed servers and fixed licenses); cloud LLMs feel more like a black box meter bill that peaks wildly.

Calculating the Carbon

The second emergent challenge is energy and ESG stress. Healthcare companies are among the nation’s largest consumers of energy, and sustainability has now crossed into IT buying[5]. Hospital boards and ESG committees are asking, “What is the carbon footprint of our AI initiatives? Recent UK reporting on LLMs in healthcare warned that they “could lead to significant resource use,” noting one LLM request uses as much electricity as powering a smartphone 11 times (about 0.43 Wh) and even ~20 mL of data-center freshwater[6]. On a colossal scale, ChatGPT’s daily emissions have been compared to 400–800 US homes[7].

In healthcare, models will be bigger (broader scope) and typically run in on-premises or corporate cloud , so the footprint can actually be bigger. Simultaneously, ESG demands are tightening up: hospital networks now must report energy and carbon alongside clinical KPIs[8]. Green concerns like energy consumption and carbon footprint are “board-level issues,” and some health systems have reported audited ESG reports on renewables and emissions[5]. Hence, employing an LLM without considering its power consumption and emissions is no longer just an IT challenge it’s an ESG risk. LLM inference (prompt answering) actually manages the lifecycle footprint different analyses estimate that inference is up to 90% of an AI system’s lifetime energy (since training is a one-time event, but inference is continual).

In short, CIOs now worry that a LLM query storm might as much support company sustainability goals as it threatens the budget.

Infratailors: Planning AI Infrastructure Wisely

To address both the pain points, hospitals need better planning tools.  Infratailors is an ML-based “AI infrastructure solution architect.” Infratailors takes in a healthcare organization’s LLM usage logs, planned use cases, and on-prem/cloud configurations, and simulates alternate scenarios to give CIOs early notice of costs and carbon. To mention a few, it can:

  • Forecast token spend : Extrapolate overall API calls and token spend across different rollout scenarios.
  • Model energy & CO₂: Extrapolate electricity usage and emissions by relating chosen LLMs to hardware and data-center carbon footprint.
  • Right-size hardware: Recommend the cheapest possible configuration. If you use Azure’s most expensive GPU nodes for on-prem inference, or launch cloud instances that are smaller? Infratailors balances throughput and latency needs against energy consumption to offer an ideal configuration.
  • Ongoing Reporting: Produce budget forecast dashboards and sustainability metrics reports. In approving an LLM project, CIOs get a “bill preview” and carbon estimate surprise (and blame) are not needed.

By assigning a number upfront to what was once unknown, Infratailors makes “why did it cost so much?” turn into “this is exactly what we budgeted.” It aligns AI rollouts with both financial planning and ESG targets.

 Scaling AI with Confidence

Hospitals can confidently scale LLM programs using solutions like Infratailors. They understand that a small pilot project costing a few thousand dollars will not break out into a cost-intensive program overnight, and they can show leadership the way each deployment of AI is tied to sustainability goals. In practice, this means: consistent yearly budgets for AI projects, and precise carbon reports to satisfy board-level ESG concerns. The AI promise of patient care need not be accompanied by budget shock or hidden environmental expense. Improved planning ultimately means AI-powered innovation from speeded-up charting to smart alerts can ripple through a health system with a precise understanding of how it will cost dollars and carbon.That certainty makes state-of-the-art AI tools not just exciting, but really feasible for healthcare’s future.

Reference's

Recent figures for healthcare LLM deployments and expenses provide concrete numbers. GPT-4 Turbo, for example, were $10/M tokens in mid-2024[2], and a research paper approximated $0.1–4.6M/year in API fees at scale[3]. Standalone research cites the environmental dimension: an LLM query ≈0.43Wh (smartphone battery charges) and ChatGPT’s daily emissions are equivalent to hundreds of homes[6][7].

These statistics point to the reason why sustainability and finance units are now questioning AI infrastructure, and how forecasting planning (as offered by Infratailors) can tame the costs and emissions of the clinic.

[1] https://www.mountsinai.org/about/newsroom/2024/study-identifies-strategy-for-ai-cost-efficiency-in-health-care-settings

[2] https://www.researchgate.net/publication/385928072_A_strategy_for_cost-effective_large_language_model_use_at_health_system-scale

[3] https://www.nature.com/articles/s41746-025-01971-x

[4] https://www.thehealthcareexecutive.net/blog/esg-reporting-hospital-leadership-2025

[5] https://www.reading.ac.uk/news/2024/Research-News/Limit-hospital-emissions-by-using-short-AI-prompts

Picture of Nohayla Azmi

Nohayla Azmi

Research Engineer, SNT

Nohayla Ajmi graduated with a dual degree in Electromechanical Engineering and Intelligent Systems & Robotics, and further completed a specialized master’s degree in Digital Project Management. She is a Research and Development Specialist at the Interdisciplinary Centre for Security, Reliability and Trust (SnT) at the University of Luxembourg, where she works on machine learning for systems. Her work investigates the future of AI systems, unlocking sustainability and efficiency through an increased understanding and management of their costs. Nohayla has several years of experience in machine learning and data science, having worked in the R&D departments of heavy industry, finance, and consulting sectors.

Leave a Reply

Your email address will not be published. Required fields are marked *

Picture of Nohayla Azmi

Nohayla Azmi

Research Engineer, SNT

Nohayla Azmi is a Research Engineer in Machine Learning at the University of Luxembourg, Interdisciplinary Center for Security, Reliability and Trust. She has previously worked as AI/ML engineer in R&D Teams for start-ups like NeoFacto and multinational steel manufacturers like Arcelor Mittal. Nohayla has total 4 years of experience after her Masters Degree in Robotics & Intelligent Systems Management from the École Nationale Supérieure d'Arts et Métiers.

Leave a Reply

Your email address will not be published. Required fields are marked *