State of the Future! Part Deux
Scale-pilled and ready to go for Season 2: AI 2030: Scale, Deploy, & Secure
I went away for a bit and now I’m back. Back to take over the globe, now break bread. Scale-pilled and ready to go. I’ve read all the things and extropolated a straight line into the future.
I went away to focus on semiconductor investing and speak to people doing interesting things. Hopefully fund them to build the future maybe? That was always one option. You know what I learned? There is no *semiconductor* investing. It’s not a thing. Nor is AI investing. You can’t think properly about the future of semiconductors or technological deployment at all without thinking about mining, energy, batteries, chip manufacturing, geopolitics, regulation, shall I continue? Pick a lane they said.
There are no lanes. Sometime there are no lanes. Wrong metaphor. It’s all interconnected gears. Gears all the way down in the great game of overthinking venture capital. So here I am again, cap in hand, 12 months late, telling you it’s actually all about energy. Mr Market has already figured out SMRs. Cheers Microsoft, Amazon, and Google. But everyone is still sleeping on Deep Geothermal. But wait. I have more. Much more.
Today, I bring to you three visions of AI to 2035: Scale, Deploy and Secure. AI is the whole ball game. It’s mining, its energy, its semiconductors, its batteries, and it’s, yes, crypto.
It’s still early.
It’s on the record. Sam, Ilya, Demis, and Dario are on the podcasts. They are saying it loudly into the microphone. Dario of Machines of Loving Grace fame is talking about 20% annual GDP growth rates, lifing billions out of poverty and curing most diseases in the next 5-10 years. Leopold wrote the roadmap down in Situational Awareness. Even if the U.S. Government can’t or won’t do a Manhattan Project 2.0: This Time It’s Serious, the wheels are in motion. The next 5 years are bought and paid up for. Land has been bought. Power contracts secured. Nuclear electricity contracted. GPUs pre-ordered. HBM and CoWoS capacity secured. This thing ain’t stopping. You can think there will be a bust, but this infrastructure is happening. It’s fibre cabling, except for compute this time. New scaling laws around inference time rather than raw power will change the balance between training and inference; data center and edge, but they don’t change the fact: we’ve embarked on the largest capital allocation project in human history.
Listen carefully. The people closest to the action tell us scaling works. Scaling has plenty of headroom for data center training. And as of September, we’ve likely just begun a new scaling path with inference time with GPT-o1. For training, just plug in more GPUs and add more data and parameters and off we go to level 5-models and maybe Gen6. Sure lots of work on synthetic data generation, efficient data sampling, post-training optimization et al, but we have the contours of where we are headed. But it’s been nearly two years and we haven’t got 5-level models. Nvidia fell 7% in a day. McKinsey thinks AI might be overhyped. It’s so over?
It was never over, and we never had to come back. The cost for 2 million tokens has come down 240x times in 2 years.. Claude Haiku, GPT-4o mini, and Gemini Nano are on the smartphones. Attention-based transformers can be natively multimodal, inputting text, voice, image, video and outputting any combo you want. Practical agents are within touching distance. o1 has fired the starting gun on better-than-human reasoning. Demis says embodiment might be as simple as just adding another modality. You don’t have to believe that attention-based transformers and diffusion models will get us to AGI. You have to believe that the richest companies in history, venture capitalists, and increasingly Governments have the incentive to scale AI.
It’s happening, but people aren’t updating their priors fast enough. With situational awareness, you see three opportunities. Scaling, deploying, and securing AI. First scale. By 2026, we'll see gen5 GPT, Claude, Gemini and Llama, with gen6 models likely in 2028 requiring $100bn+ and 10GW of power. The push to 7th-gen models could lead us towards $1trillion of cold hard cash and 100GW. We will create and transport many more electrons to data centers and try not to waste them. Second deploy. For systems to be pervasive, we will massively reduce token costs making intelligence too cheap to meter at $0.0001/million tokens. Finally, secure. For society to accept AI we must protect privacy, offer fair and unbiased models and have open access AI infrastructure. These are our problems. I bring solutions.
Scale: Pathway to AGI (Part I)
1. How do we increase power into data centers?
2. How do we make data centers consume less power?
3. How do we make servers consume less power?
Today we cover Part I, next week Part II, and guess what I will cover the week after? Correct Part III. And if you stick around, I’ll go into detail on each of the 9 problem statements offering analysis on each potential solution. If you are good, I’ll do a value chain analysis too….
Scale: Pathway to AGI
The scaling hypothesis worked. It’s a foot race. A foot race with Sam, Demis and Dario carrying billions in their pockets. And now Ilya too, the classic 1bn and 5bn seed round. Labs are investing genuinely unprecedented capex in scaling their models. This isn’t speculative. Capex has been budgeted for. Chips have been pre-ordered. 3-mile Island is being restarted and Microsoft have agreed a 20 year contract to supply power to their datacentres. By 2026, we'll see 5th-level GPT, Claude, Gemini and Llama, with 6th-level models following in 2028/29. As Leopold outlined, this needs $100 billion training clusters by 2028 and with something like $1 trillion for 7th-level clusters by 2030. Big numbers. But it’s also likely Elon Musk will be worth $1 trillion by 2030. So I dunno, it’s all relative in the fight for the future. At these scales, algorithms and hardware matter, but it’s a power game. Securing enough of it, delivering it into data centers, and using it efficiently. While GPT-4 used about 10 MW, 5th-level models may need in the order of 1GW - equivalent to a large nuclear reactor. For 2030+ clusters we might be staring down the barrel of 100GW, far exceeding current datacenter capacities. Scaling to this level requires mainly increasing power supply. But also in parallel we should aim to reduce power consumption, and improve system efficiency. I used to joke that a future AI fund is a rare earth mining fund. But for now investing in AI is basically an energy fund. If you are a ClimateTech fund reading this, I strongly suggest new positioning as “AI and Climate”. You’re welcome.
1. Increase power into data centers
Hyperscale data centers typically consume between 100-250 MW of power. A 1 GW (1000 MW) AI training cluster is 3-10x the power consumption of today's largest data centers. We need to increase the amount of power into data centres. Alot.
1.1. On-site Power Generation
Right, task number one, we need more energy. We can talk about efficiency all we want but cooling and optimization can only get you so far. We’ve got 250 MW facilities, we need 4x that. The best way would be to generate more of it right there next to the servers. This is on-site power generation. This gets around the problem of needing to upgrade the grid, which we will get on to, you can build reliable and carbon-friendly capacity and if you sell it right, you can contribute to national energy independence without having to rely on Russia and others. If globalization taught us anything it is that globally interconnected supply chains are a bad thing and we are all the poorer for it.
Buy: Small modular reactors (SMRs), Fuel cell systems, Concentrated solar power with thermal storage
Watchlist: Microturbines for combined heat and power, Enhanced Geothermal Systems (EGS) (Political bottleneck, HUGE potential), Nuclear fusion (2035+) (long time readers know, I have opinions. Short-term bearish, long-term bullish, but by 2035 what price nuclear/solar/wind?, cost-competitive much?)
1.2. Grid Upgrades and Ultra-High Voltage (UHV) Networks
Generating on-site power takes time, in theory a better way would be to just upgrade existing energy grids to push more power through to data centers. This leads us to grid upgrades and ultra-high voltage (UHV) networks. Grids are being constantly upgraded but it’s never really been a major concern for the tech industry. Sure the hyperscalers run their own data centers and there are wizards that manage them. But generally your 200MW data center power budget was fine to run your Intel Xeon at 1500W at absolute peak power. Now the problem moves outside the data center to the pipes outside. Before your problem was technical, now your problem is political. Call the lobbyists. Anyway, one doesn’t just upgrade the grid and build UHV networks. Getting more power through the cables is a challenge. But also these data centers are often far from traditional power sources necessitating efficient long-distance power transmission. Also, we aren’t just building a new coal plant here. New data centers will need to incorporate renewables too for which grids across the world haven’t yet managed to fully onboard.
Buy: HVDC (High Voltage Direct Current) transmission systems, Grid-scale energy storage, Superconducting power cables
Watchlist: Flexible AC Transmission Systems (FACTS), Advanced power flow control devices, Dynamic Line Rating (DLR) technologies, Wide Area Monitoring Systems (WAMS)
1.3. Power Delivery
Okay, so we’ve got more power coming into the data center either because we’ve built an SMR next to it, or because miraculously we’ve upgraded the grid around it. Now we have a problem. Our pipes to the rack are only designed to carry 7-10 kw. But the new Nvidia behemoth wants 20-30 kW. Also higher power densities amplify the importance of efficient power conversion and distribution to manage operational costs and environmental impact. And the AI clusters need uninterrupted power for the training runs, so we need super reliable power delivery systems. Also these AI servers are probably going to get even larger so we probably don’t want to rip out these new power delivery systems in 3 years when we need to run the 6th generation models. Call the wizards.
Buy: High-efficiency, high-density power supplies, DC power distribution at rack level, Software-defined power systems
Watchlist: Modular UPS systems, Intelligent Power Distribution Units (PDUs), Advanced busway systems with higher current capacity, Integrated rack-level energy storage, Thermoelectric cooling systems for targeted heat removal
2. Make data centers more efficient
Heat is your enemy. Heat and the von Neumann bottleneck. Traditional data centers operate at a Power Usage Effectiveness (PUE) of around 1.6, meaning 60% of energy is used for non-computing purposes, primarily cooling. For the new AI factories, this inefficiency is magnified due to higher power densities.
2.1. Advanced Cooling Technologies
Data center cooling is already a massive challenge, if not, the primary challenge for data centers. Now we want to bring 10-100GWs into these facilities. Thermal management architect is going to be the next Solidity developer. Heat densities exceeding 50 kW per rack will push cooling systems to their limits already accounting for roughly 40% of a data center's energy consumption. The imperative for energy efficiency is clear, as every watt saved in cooling translates to significant operational cost reductions and improved environmental performance. For instance, improving cooling efficiency by 20% in a 10 MW data center could save approximately $1 million annually in energy costs. Moreover, effective thermal management is crucial for maintaining optimal AI hardware performance, as even brief thermal excursions can trigger throttling, potentially reducing compute speed by 30-50%.
Buy: Direct liquid cooling systems, Two-phase immersion cooling, Direct-to-chip liquid cooling
Watchlist: Electrochemical Additive Manufacturing (ECAM), Rear-door heat exchangers, Evaporative cooling towers, Phase change materials for thermal energy storage, Thermoelectric cooling for targeted heat removal, Micro-channel liquid cooling, Nanofluids for enhanced heat transfer, Magnetic refrigeration systems
2.2. Data Center Networking
Beyond heat loss the other big way data center losses energy is when moving data around from rack-to-rack and server-to-server. I’ll discuss chip-to-chip solutions later. Networking was always an inefficiency, but frontier models require training across multiple racks and servers within racks and connections need to be high-bandwidth, low-latency, and low-power. The energy cost and time delay associated with moving data between compute nodes can be a significant bottleneck. The primary driver will continue to be for higher-bandwidth and lower latency connections, but if power consumption isn’t addressed networking could end up becoming one of the biggest energy loss components in the datacenter.
Buy: Silicon photonics-based AOCs, Co-packaged optics (CPO), Wireless inter-rack communication
Watchlist: Polymer Optical Fibers (POF), Software-defined networking (SDN), Coherent optics, Passive optical circuit switching
2.3. Data Centre Energy Optimization
The final opportunity worth discussing is how to optimize data center energy consumption. Rather like software-defined networking above and other optimisation problems, there are huge performance and efficiency gains to be had as datacentres get larger but also more complex. The integration of new cooling, networking, and computing technologies into existing facilities likely offers huge optimization opportunities. This is somewhat of an application of the frontier AI we are looking to scale and deploy. The very same models we hope to scale with 100 MW data centers can be turned back around and used to optimize the data center it is being trained in. This is a riff on the recursive self-improvement motif, except that the AI isn’t improving itself but rather improving its house? We will talk about recursive self-improvement later in the context of AI improving the design of its own hardware.
Buy: Reinforcement learning for HVAC control, AI workload scheduling and resource allocation, AI-powered dynamic voltage and frequency scaling
Watchlist: Computer vision for thermal mapping and hotspot detection, Federated learning for multi-site energy optimization
3. Make server chips more efficient
So we’ve managed to get 10GW even 100GW into data centers and we’ve sorted the data center with better cooling and networking tech and optimized it all with the same AI we are building. Ideally we wouldn’t then waste that energy using unoptimized computers.
3.1. Data Centre Processors
Frontier models are really big. And they are all really linear algebra. Even more precisely they are just multiplying matrices. What if we just made logic chips to just do that? Well, we already have optimized AI chips, Nvidia GPUs are no longer general-purpose chips that also go in gaming PCs. They’ve optimized the chip and server to run lots of matrix multiplications really fast. Google's TPUs, now in their fourth generation, have shown consistent improvements in computation efficiency, with TPU v4 pods delivering more than 1 exaflop of compute power. Every generation from Nvidia, Google, and AMD improves performance per watt. We already have Groq, Cerebras, Tenstorrent, SambaNova all competing with Nvidia with “more” optimized designs. Others like D-Matrix and Rain are using digital in-memory designs to reduce latency and power consumption. Etched.ai is going all in on the transformer ASICs. But for your run-of-the-mill 5x better performance or power consumption, you are ngmi on margins versus Nvidia scale and CUDA moat. So I’m looking for that 10x or 100x leap forward in performance to make the extra cost and hassle of switching worth it. Honestly, for datacentre scale, your analog or neuromorphics will get crushed on performance. I’m looking at photonic chips and then, while we are on it, let’s go all in and aim for the Landauer limit and make computations reversible or compute with time instead? Incremental gains are not taking us to the places we want to go. Everything’s on the table to build our new God.
Buy: Photonic chips, Reversible computing, Temporal computing
Watchlist: 3D stacking, In-memory computing, Neuromorphic architectures, Cryogenic computing, Spin-based computing
3.2. High Density Memory
“DRAM doesn’t scale anymore. In the glory days, memory bit density doubled every 18 months – outpacing even logic. That translates to just over 100x density increase every decade. But in this last decade, scaling has slowed so much that density has increased just 2x”. So says Dylan Patel. Just as logic chips improve dramatically in terms of density and cost per transistor function, DRAM improvements have been minor and increased bandwidth has been a function of expensive packaging, not scaling. Memory is unlikely to be a long-term bottleneck to AI data center scaling, but the question is one of economic viability. The DRAM roadmap hints at some brutal trade-offs in cost and power to achieve the throughput required for trillion+ parameter models. Today, High Bandwidth Memory (HBM) is the solution for almost every AI accelerator. It prioritizes bandwidth and power efficiency but is very expensive at 3x the price of DDR5 per GB. HBM3e can deliver 36GB capacity and about 1.2 TBps performance. Gen 4 is the only game in town for data center AI accelerators. Other DRAM varieties like DDR5, LPDDR5X and GDDR6X target different cost, performance, and power requirements. Some companies are combining the high performance, high cost of HBM with the lower performance, lower cost of LPDDR. This is all fine but the truth is HBM is a hack. It’s a packaging solution to increase density to solve for the inherent DRAM bandwidth and power problems.
Buy: Compute-in-memory (CIM), Ferroelectric RAM (FeRAM), Magnetoresistive RAM (MRAM)
Watchlist: 3D DRAM stacking, Hybrid Memory Systems in AI accelerators
3.3. Advanced Interconnect Technology
Logic and memory are fine if you can process everything on a single chip. Doing a Graphcore as it’s known. But your Claude’s and your ChatGPT’s are too large to fit on a single GPU because they contain hundreds of billions of parameters, requiring far more memory (often 350 GB or more) than the 80 GB VRAM typically available on top-tier GPUs. Plus, the computational load of processing these models demands significantly more power than a single chip can provide, necessitating the use of multiple GPUs. To handle both the storage and computation, these models are distributed across many GPUs, using techniques like model parallelism to share the workload. The Nvidia DGX-1 isn’t winning because it has the fastest processor or memory. It is a world leading package. And key to the package, arguably the biggest moat Nvidia have, is NVLink, its GPU interconnect. Unlike traditional PCIe switches, which have limited bandwidth, NVLink enables high-speed direct interconnection between GPUs within the server. NVLink offers 3x more bandwidth at 112 Gbps per lane compared to PCIe Gen5 lanes. This enables the creation of tightly coupled multi-chip modules that can function as a single, more powerful logical GPU. UALink, is a joint effort between the hyperscalers and AMD, Intel, Broadcom, Cisco and others should eventually commoditize NVlink, but scaling the 100GW cluster will be about faster, higher bandwidth interconnects. Turning it up to 11. Interesting solutions to explore are silicon photonics, advanced packaging with Through-Silicon Vias (TSVs) and chiplet architectures with advanced interconnect fabrics. As with photonic processors for AI accelerators, we turn our attention to photons instead of electrons again. Photons just move faster. We already move data over long distances with light through fiber optic cables, surely we should be able to move light over short distances, too?
Buy: Silicon photonics, Optical interposers, Chiplet packaging
Watchlist: Graphene-based interconnects, Wireless chip-to-chip communication, Spintronic interconnects
Disclaimer: Obviously buy doens’t literally mean buy the stock. There is no stock in this case. By “buy” I mean, get in touch and we will have a little talk.
Next week, I’ll take on what it will take to make intelligence too cheap to meter.
“Deploy: Intelligence too cheap to meter: Let's assume we’ve built these AI factories and are feeding 10-100 GW to our new God in the sky. It will all be for nought if it costs a fortune to speak to it. The labs have done remarkably well on the cost front already, with the cost for 2 million tokens (input+output) decreasing from $180 to $0.75 in 2 years. 240x cheaper is some serious numbers. But it’s not enough, just like we will need unprecedented power generation, distribution and delivery, we are going to need to throw everything at reducing costs. I’m saying $0.0001 per million tokens. It’s not a useful number, it’s just a very low number. But basically we are talking “too cheap to meter”. Only when it’s this cheap can it be integrated into all devices and products seamlessly. Gods don’t have usage caps.”
Deploy: Intelligence too cheap to meter (Part 2)
4. Reduce semiconductor manufacturing costs
4.1. ML-EDA
4.2. Chiplets
4.3. Manufacturing Inspection Tools
5. Optimize power consumption on edge devices
5.1. Edge AI Chips
5.2. Edge AI Power Management
5.3. Mobile Batteries
6. Run efficient local models
6.1. Model Compression
6.2. Efficient Architectures
6.3. Adaptive Inference
Share with your friends. They can also prepare for the future.
Long fire insurance
Good piece. Fun read too.