Discussion about this post

User's avatar
Elad Verbin's avatar

This is a great discussion! I think you forgot to note one crucial point, which actually makes your argument much stronger (and improves the "compute gradient" overall). I think the biggest question about the high-scale AI workloads of the future is not just "where does it run" but "what does it run". Some tasks require big models, some only need small models, some need small-but-specialized models, etc. .

Right now we do have AI model orchestrators dealing with this issue, but they're not very good at this. As time goes on, and the capabilities of top-end AI models grow stronger, we'll see a much stronger "capabilities gradient". So when you take an inventory of the usage of a professional, it won't be "10,000 queries", it will be "9000 small-model cheap queries, 900 larger-model queries, and 100 huge-model queries". Many of the 9000 will run on the edge, but the 100 huge-model queries will run serverside. Regardless, the electricity cost won't be simply 10,000 times the cost of a single query, but something much more nuanced, with a lot of efficiencies gained using this.

The details are still uncertain. I personally think it's true that the future is bottlenecked by electricity production. I think AI will end up consuming a very large amount of our electrical output, and that electrical buildout is crucial. But I think that analyzing the numbers and growth rate is very hard. In other words: I think the future does get bottlenecked by electricity production, and at some future year X, the more electricity you produce, the more value you can create, with an almost linear relationship between the two. But I don't know the value of X. I could see X being 2030, but I could also see it being 2035 or 2040. I think it'll definitely happen by 2040. (And given the timelines of infrastructure projects, I guess even X=2040 means "we should start the buildup immediately and aggressively").

Expand full comment
SamKan's avatar
7dEdited

Interesting post! I think I am too dense to see the explicit connection between the two figures. What is the relationship between “fleet retrain, phone asr, etc” and the “workload type” in the table?

Expand full comment
3 more comments...

No posts