
India's inference needs are a ticking time bomb for our account deficit..
This essay was first published on Twitter/X.
I have been meaning to write about this piece for a long time but was rudely forced to when I saw my monthly Claude Max subscription renewal email. In my limited existence on the internet (~15 years) I have not paid more than $10 per month for any software until Claude/ChatGPT came into my life.

I also genuinely dislike "thesis" essays but I was compelled to write about this one because i don't think anyone else is thinking how big of a problem this is about to become; definitely not in the top echelons of our government which probably has bigger fish to fry (not in cooking oil)..
With this week's address to the nation, PM highlighted about our growing CAD (Current Account Deficit) that prompted him to advise our fellow citizens from 'traveling abroad', 'cut down office travel' and 'don't buy gold" etc. How is any of this related to inference you may ask?
Ever since I started spending time on IT services especially around how AI was going to radically disrupt the margin structure of these companies, it has been hard not to think about Daniel Gross's 2024 AGI trades essay where he equates India's then $250B IT exports to GPT-4 tokens (which at that point did not even reason). The GPT 5.5 tokens of today have become far more valuable...

Petrodollar has been the de-facto currency combination for sometime. Oil and Gold imports historically and continues to widen our current account deficit. It weakens the rupee when prices spike. Oil makes macroeconomists, finance ministers and central bankers nervous for a reason. (the ongoing war has exposed structural weakness in our economy to make matters worse)
But the AI economy is creating a new kind of import dependency that increasingly looks like the new petrodollar - the token-dollar
If India’s services economy starts depending on foreign model tokens to deliver software, analytics, consulting, BPM, customer support, compliance, finance ops and coding work, then we are creating a new dollar-denominated input for India's most important export engine.
India does not need a world-class domestic inference player because it sounds patriotic or that sovereign sounds cool but because the rupee cannot afford a future where every unit of AI-enabled work is rented in dollars.
India’s macro cushion has been services (for now..)
India’s external account has a simple structural problem. We import a lot of important goods essential to our economy. Oil, gold, electronics, machinery, semiconductors, defence equipment, industrial inputs. These imports create constant dollar demand.
The thing that saves us has always been services; specifically ITeS.
India closed FY26 with services exports at $418.3B against services imports that produced a net services trade surplus of $213.9B . That surplus is the only reason the rupee did not break much earlier. It pays for our oil, our gold, our electronics, our machinery. The Finance Ministry's own April review admits the services surplus offsets 64.2% of the merchandise trade deficit. The entire external account balances on the back of services exports, and services exports balance on the back of Indian developers, analysts, support staff, and engineers selling time to foreign clients at a margin.
This macro bargain India has lived with for years is getting radically disrupted as we speak..
The lazy AI future for India converts services exports into token imports
Today, an Indian IT company earns dollars from a global client and pays most of its cost base in rupees.
Tomorrow, the same company may earn dollars from a global client, but pay a foreign model company for inference every time an AI agent reasons, codes, drafts, checks, searches, summarises, classifies or responds. The previous era's arbitrage will not hold anymore in an increasingly tokenomic world..
The risk is not that Indian companies will use foreign AI tools. They already are and will increase spends by the looks of the recent MoUs signed with OpenAI and Anthropic. The risk is that the scarce input in services delivery shifts from Indian labour to foreign inference.
Once that happens, Indian IT becomes a reseller of someone else’s intelligence.
Tokens behave like imported intermediate goods
A token is easy to dismiss because it feels abstract unlike the barrels of crude oil that unload at Jamnagar.
But in the external account, abstraction does not matter since a recurring dollar payment is still rupees outflowing.
Now map that onto India’s services industry. A $42B annual token bill at 10% of foreign inference spend would not look like oil on the trade ledger. But it would behave like a major external-account leak.
The rupee feedback loop
This becomes uglier when you add currency depreciation.
Foreign inference is priced in dollars. Indian companies often earn, spend or report in rupees. If the rupee weakens, every dollar-priced token becomes more expensive in local currency terms.
That creates a vicious loop:
More AI adoption increases foreign inference demand. Foreign inference demand increases dollar outflow. Dollar outflow adds pressure to the external account. A weaker rupee makes foreign inference more expensive. Higher inference cost compresses Indian margins.*
This is why domestic inference is not just a sovereign vanity project but it almost acts like an FX hedge for every tech company and startup in India.

GPU DCs are the oil refineries of the future
Strategically, GPUs are the closest thing the AI economy has to productive energy assets because they turn electricity, chips, models and software into cognition/intelligence in the form of tokens. And now raw tokens these days increasingly produce and do the work as well.
A country that lacks compute will rent intelligence from countries and companies that have it. At least in the case of oil, you had to have divine blessings to find oil within your borders but for producing tokens you only need Jensen's blessing and a PO (hopefully Lisa Su comes through as well..)
We are rapidly heading towards a very real future where India will be forced to import all of its tokens because it failed to build a reliable domestic inference market that is not dollar denominated.
Procurement alone will not solve this
The government has moved in the right direction. Under the IndiaAI Mission, more than 38,000 GPUs have been onboarded for a common compute facility, with access being made available to startups, academia and public institutions. GPUs under the mission have also been described as available at ₹65 per hour.
But a GPU cloud without a strong inference layer becomes another government portal. Useful for pilots and headlines, weak for production use-cases, invisible to developers, irrelevant to enterprises.
How would a public Indian inference platform look like? It should have
- Rupee pricing.
- SoTA open-weight models served in the most token efficient manner (vLLM, SGLang, TensorRT etc..)
- Public latency and throughput benchmarks updated realtime
- Private deployment options.
- Fast model refresh cycles.
- Indian language performance.
- Cost per solved workflow, not just cost per token.
The benchmark should be simple: can an Indian developer switch from a foreign API to a domestic endpoint without rewriting the application, sacrificing reliability or waiting three-six months for the model catalog to catch up?
If the answer is no, we do NOT yet have an inference platform or strategy.
We may have missed the frontier SoTA LLM pre-training game this time
India should be honest about where it stands.
Training a frontier model against OpenAI, Google, Anthropic, Meta and China’s strongest labs requires compute, talent, data, infrastructure, evaluation depth, distribution and billions of dollars of risk capital.
But the immediate macro battle is inference because most enterprise workflows do not need the smartest model on earth. They need the cheapest model that solves the task reliably.
And inference is genuinely a money spinner as long as the GPUs are not on your balance sheet. Considering state of the art inference is a hardware-software co-design being as close to the metal here helps and builds structural moats to your business.

A back-office claims workflow does not always need the latest Opus/5.5. A compliance agent does not always need maximum frontier reasoning. A customer support summariser does not always need the most expensive model available.
A strong open-weight model, routed-cached-fine-tuned to custom workflows, with real world evals and served locally at lower cost can win a large share of Indian enterprise workloads.
That is where India still has a window. China realized this about 2 years back with LLMs and 15 years back with cloud computing infrastructure.

Chinese tech majors have realized they cannot stay silent and cede control to the west with model training - the recent H200 purchases and their focus on Huawei's homegrown NPUs also show how seriously they treat this impending import tax. Our top leaders are busy pontificating how we should stop pre-training and learn to love the use-cases without any real inference capacity to show for..
The US already understands the inference layer deeply and are becoming inference research labs..
Together (Tri Dao their Chief Scientist is the brain behind Flash Attention), Fireworks (RL post-training with Cursor as anchor customer), Baseten, DeepInfra, Modal and now with vLLM (Inferact) and SGLang (Radixark) both raising 100s of millions of dollars in funding are building high-performance inference layers around open-weight models - defacto becoming inference neo-labs.
They update model catalogs quickly. They compete on latency, cost and developer experience. NVIDIA itself has highlighted companies like Baseten, DeepInfra, Fireworks and Together for reducing token costs through optimized inference on Blackwell systems.

Not a single Indian company operates at this scale even when the world has traditionally recognized our ability to deliver on IT services.
As a result we still lack the default domestic inference platform that a serious developer, bank, SaaS company, IT vendor or a government department can trust as the first port of call.
While we have the likes of Simplismart, Neysa and Yotta - one quick look at their platforms shows how far behind we are across all the different parameters.


The Indian incumbent problem
India’s large GPU and cloud players cannot keep treating inference like a catalogue page. If the world is moving from Llama to Qwen to DeepSeek to Kimi to whatever ships next month (Xiaomi now has a frontier open weights model!!), Indian inference providers cannot update model catalogs at PSU procurement speed.
Domestic inference should be held to the same standard as foreign inference.
- Better, where India-specific workloads matter.
- Cheaper, where rupee economics matter.
- Private, where regulation matters.
- Current, where model performance matters.
This is a policy problem and a startup opportunity
The good news is that this is buildable and we need not boil the ocean.
We already know Anthropic, Google and OpenAI all of them consider India as their second largest market for AI spends - the appetite is clearly there. ElevenLabs is on track to make $100M in annual revenues in India purely selling voice inference.
While the IndiaAI mission got started on the right track somewhere the rubber failed to hit the road - because we thought procuring GPUs alone was enough.
Subsidised compute should reward actual production usage, not for logo collection.

The question is who captures the margin in the end (apart from Jensen). If foreign model companies capture the token margin, foreign clouds capture the infrastructure margin, and Indian companies keep the integration margin, then AI eats the only advantage we have (labour arbitrage) making its external dependency worse.
If India builds domestic inference, the story changes since all of the imported GPUs become domestic infrastructure - Some dollar token spend becomes rupee spend and some margin stays with Indian infra companies.
Some AI capability becomes available to Indian startups and enterprises without constant FX exposure. Some services exports become more defensible.
Closing
India’s next oil import style bill may not only come from tankers docking at Jamnagar.
It WILL definitely come from millions of API calls made by Indian service companies and startups that call themselves AI-native while renting intelligence in dollars.
For decades, India imported energy and exported software.
But this is avoidable. It requires India to stop confusing GPU procurement with inference capacity, stop confusing patriotic sovereign branding that is NOT production-grade infrastructure, and stop pretending the AI bill is somebody else's problem. The bill is already arriving. Mine was $138 last month that was severely subsidized. Yours will be not be soon in the near future.
Domestic inference is now macroeconomic infrastructure. We need to ramp up capacity on war footing like the rupee depends on it