Token‐billing charges by the response and by the prompt. For example, OpenAI’s GPT‐4 Turbo (128K) charged about $10.00 per 1M input tokens in mid-2024[2] (GPT‐3.5 Turbo‐16K was ~$1.50 per 1M). In concrete terms, if a patient note averages 1,000 tokens, then 1,000 notes cost about $10 on input alone. Toss in the model’s output tokens, and frequent use of LLMs easily runs into hundreds or thousands of dollars per month. Scale that to dozens of departments and millions of tokens per year, and you’re into six or seven figures in annual API fees. One healthcare AI analysis projected $115K–$4.6M per year in pass-through costs for GPT‐based inference at an enterprise health system scale[3]. In short, a pilot that seems cheap at small volume can explode into a major line item as usage grows.
Operational complexity makes costs hard to predict. Queries vary in length and frequency, and model failures can multiply calls. Mount Sinai’s study found that smartly grouping clinical tasks could cut API calls by up to 17×, potentially saving “millions of dollars per year”[4] for a large system. But without sophisticated planning, CIOs can’t know in advance how many tokens (and dollars) a bigger rollout will consume. One clinician remembered using GPT-4 for nightly reports and then having a five‐digit monthly expense that was unforeseen. This is a change from more predictable IT expenses (fixed servers and fixed licenses); cloud LLMs feel more like a black box meter bill that peaks wildly.
The second emergent challenge is energy and ESG stress. Healthcare companies are among the nation’s largest consumers of energy, and sustainability has now crossed into IT buying[5]. Hospital boards and ESG committees are asking, “What is the carbon footprint of our AI initiatives? Recent UK reporting on LLMs in healthcare warned that they “could lead to significant resource use,” noting one LLM request uses as much electricity as powering a smartphone 11 times (about 0.43 Wh) and even ~20 mL of data-center freshwater[6]. On a colossal scale, ChatGPT’s daily emissions have been compared to 400–800 US homes[7].
In healthcare, models will be bigger (broader scope) and typically run in on-premises or corporate cloud , so the footprint can actually be bigger. Simultaneously, ESG demands are tightening up: hospital networks now must report energy and carbon alongside clinical KPIs[8]. Green concerns like energy consumption and carbon footprint are “board-level issues,” and some health systems have reported audited ESG reports on renewables and emissions[5]. Hence, employing an LLM without considering its power consumption and emissions is no longer just an IT challenge it’s an ESG risk. LLM inference (prompt answering) actually manages the lifecycle footprint different analyses estimate that inference is up to 90% of an AI system’s lifetime energy (since training is a one-time event, but inference is continual).
In short, CIOs now worry that a LLM query storm might as much support company sustainability goals as it threatens the budget.
To address both the pain points, hospitals need better planning tools. Infratailors is an ML-based “AI infrastructure solution architect.” Infratailors takes in a healthcare organization’s LLM usage logs, planned use cases, and on-prem/cloud configurations, and simulates alternate scenarios to give CIOs early notice of costs and carbon. To mention a few, it can:
By assigning a number upfront to what was once unknown, Infratailors makes “why did it cost so much?” turn into “this is exactly what we budgeted.” It aligns AI rollouts with both financial planning and ESG targets.
Hospitals can confidently scale LLM programs using solutions like Infratailors. They understand that a small pilot project costing a few thousand dollars will not break out into a cost-intensive program overnight, and they can show leadership the way each deployment of AI is tied to sustainability goals. In practice, this means: consistent yearly budgets for AI projects, and precise carbon reports to satisfy board-level ESG concerns. The AI promise of patient care need not be accompanied by budget shock or hidden environmental expense. Improved planning ultimately means AI-powered innovation from speeded-up charting to smart alerts can ripple through a health system with a precise understanding of how it will cost dollars and carbon.That certainty makes state-of-the-art AI tools not just exciting, but really feasible for healthcare’s future.
Recent figures for healthcare LLM deployments and expenses provide concrete numbers. GPT-4 Turbo, for example, were $10/M tokens in mid-2024[2], and a research paper approximated $0.1–4.6M/year in API fees at scale[3]. Standalone research cites the environmental dimension: an LLM query ≈0.43 Wh (smartphone battery charges) and ChatGPT’s daily emissions are equivalent to hundreds of homes[6][7].
These statistics point to the reason why sustainability and finance units are now questioning AI infrastructure, and how forecasting planning (as offered by Infratailors) can tame the costs and emissions of the clinic.
[2] https://www.researchgate.net/publication/385928072_A_strategy_for_cost-effective_large_language_model_use_at_health_system-scale[3] https://www.nature.com/articles/s41746-025-01971-x
[4] https://www.thehealthcareexecutive.net/blog/esg-reporting-hospital-leadership-2025
[5] https://www.reading.ac.uk/news/2024/Research-News/Limit-hospital-emissions-by-using-short-AI-prompts
Research Engineer, SNT
Nohayla Ajmi graduated with a dual degree in Electromechanical Engineering and Intelligent Systems & Robotics, and further completed a specialized master’s degree in Digital Project Management. She is a Research and Development Specialist at the Interdisciplinary Centre for Security, Reliability and Trust (SnT) at the University of Luxembourg, where she works on machine learning for systems. Her work investigates the future of AI systems, unlocking sustainability and efficiency through an increased understanding and management of their costs. Nohayla has several years of experience in machine learning and data science, having worked in the R&D departments of heavy industry, finance, and consulting sectors.
Research Engineer, SNT
Nohayla Azmi is a Research Engineer in Machine Learning at the University of Luxembourg, Interdisciplinary Center for Security, Reliability and Trust. She has previously worked as AI/ML engineer in R&D Teams for start-ups like NeoFacto and multinational steel manufacturers like Arcelor Mittal. Nohayla has total 4 years of experience after her Masters Degree in Robotics & Intelligent Systems Management from the École Nationale Supérieure d'Arts et Métiers.