0:00
/
0:00
Transcript

AI Chips, ComputeRAM and the Future of Data Movement: A Conversation with Manu, Founder of Synthara

Stop! Moving Data

“I’m tryna lead a nation, to leave to my little’ man’s. The scales was lopsided, I’m just restoring order”

Hello friends, colleagues and enemies. Apologies for the delay, especially for my paying subscribers. I have failed to add as much value as I promised. And for that, I can only apologise. I mean, I’ve been somewhat distracted, I thought a nice little bolthole in Nuuk over January will 10x my productivity. Little bit of Claude Code and a little bit of solitude…

Anyway, I’ve managed to get myself on the last chopper out of Saigon and here I am back on the semiconductor horse.

The headline? Moving data around is expensive, so let’s not? Standard story: don’t move data to the datacentre because you have to pay to send data around the world in cash and time. But the same it true at a lower level of abstraction: the chip itself.

I’ve written about this before, but I think it’s still underpriced, raw processing power is not the performance bottleneck. GPU arithmetic units spend most of their time idle, stalled while waiting for weights to be fetched from HBM. The memory wall.

The whole Nvidia and Groq $20 billion deal is about the same thing. Groq’s entire value prop is to bypass HBM altogether: their LPUs stick 230MB of on-die SRAM delivering 80 TB/s of bandwidth, roughly 1 OOM higher than what you get from HBM.

You may have heard of Cerebras, the other big AI chip company. Well they worried about the memory wall so bad, their WSE-3 chips stuffs 44GB of SRAM directly onto the wafer with 21 PB/s of memory bandwidth, 7,000x what you get from a single GPU’s HBM stack. The architecture exists to eliminate the memory wall. And OpenAI just signed a deal to deploy 750MW of Cerebras capacity, which tells you something. SRAM is orders of magnitude faster than DRAM but far less dense, so Cerebras compensates by using an entire silicon wafer as a single chip. Bold move Cotton.

And finally, Etched, the hotshot new AI chip company, just closed a $500 million round at a $5 billion valuation to build a chip that literally only runs one algorithm (the transformer) (Hummingbird at it again, what up). Their bet is related: hard-wire the matrix multiplication patterns specific to transformer inference and strip out all the general-purpose overhead, volia dramatically reduce the memory traffic required per token.

And another one imma discuss today: is something called “in-memory compute”. This is where you basically say, why don’t we just literally run operations on memory cells? So instead of having a bit of logic and a little bit of memory on different parts of the chip, why not, like, just have one bit of memory, and do operations in there?

Well, because, it’s very hard to do.

True compute-in-memory means making the memory cells themselves do the maths. A resistive RAM crossbar array, memory that stores data by changing its electrical resistance, can encode neural network weights as resistance values, and when you apply voltages as inputs, Ohm’s law naturally computes the matrix multiplication as current flows through. The physics does the work. No separate ALU required. In practice, it’s a nightmare to manufacture, calibrate, and scale across process nodes. The industry has been chasing this for years with limited success.

There’s a less radical approach: near-memory compute. Here you keep the memory cells as memory cells, but you shove the compute logic as close to them as physically possible. You’re not eliminating the data shuffling, you’re just making it very, very short. Samsung’s HBM-PIM does this, embedding small processing units directly inside the memory stacks.

Synthara, the Swiss company I spoke to for this issue, sits somewhere in this territory. They call their product “Compute RAM” and sell it as IP to chip designers. They don’t touch the underlying bit cells or claim any analog magic. What they do is tightly couple digital compute logic to standard memory arrays and provide the software stack that makes it all work without breaking your existing toolchain. The efficiency gains, they claim around 100x for edge devices, come from drastically shortening data paths rather than from exploiting exotic physics.

Manu Nair, the founder, did his PhD in neuromorphic computing but has deliberately walked away from the analog approach. His argument: the industry doesn’t want analog, it’s hard to port across process nodes, and you can get most of the benefits with careful digital design anyway. Whether that’s pragmatism or cope is for you to decide. But the IP licensing model means Synthara could end up inside chips from NXP, Infineon, or even the AI inference startups, without having to bet the company on a single tape-out.

The interview gets into all of this. What did I learn?

  • Data movement is the meta-problem. Not compute, not memory, but the cost of shuttling bits between them. DeepSeek’s efficiency gains, Apple’s unified memory, the entire custom silicon explosion: all symptoms of the same constraint. Once you see it that way, the architectural choices across the industry start to make sense.

  • You don’t need analog magic to win. Manu did his PhD in neuromorphic computing. His claim: careful digital design with compute logic shoved right next to standard memory arrays gets you most of the efficiency gains without the manufacturing nightmares. The industry doesn’t want analog. It wants something that scales across process nodes. Yep.

  • Custom silicon is a graveyard. Cerebras, Groq, Tenstorrent: a decade in, sub-one-percent market share. The IP licensing model that everyone dismisses as “capping your upside” might actually be the only viable path for new entrants. Arm proved you can define an entire computing era without fabricating a single chip.


Lawrence: Hey Manu, briefly explain who you are and what you do.

Manu: My name is Manu. I’m the founder of Synthara, a Swiss semiconductor company. We are working on a product called Compute RAM, which is set to define the architecture for the next era of AI-capable, scalable, sustainable processors.

Lawrence: No one knows what Compute RAM is. Break it down as simply as you can.

Manu: The most interesting problem today in processing is how you deal with the extreme demands of AI compute. At the heart of that problem is how memory interacts with the part that actually does the compute. Typically, most new chip designs are focused on figuring out how to do that better. So if you think of what Apple’s doing, what Google’s doing, they’re all working on this.

Lawrence: There’s a memory cell and there’s a logic cell, right? They’re not connected and there’s a bus between them. In order to fetch weights, you have to go to memory and then back to the processor. A lot of the time is spent in fetching. Is that right?

Manu: Yeah, it’s really like logistics. If you have a warehouse sitting far outside the city, you need to shuttle goods to the heart of the town a billion times a second. That’s terribly inefficient. Architecture really deals with how you stagger things in different places in strategic ways so you can be more effective.

That’s actually not a bad analogy at all, and what we are offering is a solution that standardizes these decisions. Compute RAM helps our customers make these decisions effectively, and as they make these decisions, we also ensure that their software and system architectures don’t break. They’re able to transition to far more intense AI-rich environments in a way that retains everything they have built so far. You don’t want to reinvent the whole thing just because AI came in.

Lawrence: In context then, this idea of in-memory compute or near-memory compute. We’re talking about geography, literally the location of those cells on a computer chip. You put them either as close as possible so that the logistics road is shorter, or if you put compute within the memory, then there is no road to travel. Is that the right heuristic to use?

Manu: Yeah, exactly. That’s a very good heuristic. The other way to think about it is you fetch something, you use it as much as you can. There will always be a hierarchy of memories. There is no way on-chip memory will hold all of the internet’s data. So there is going to be something that holds all of that, but once you fetch it, you use it as much as you can before you discard it, and then you build different hierarchies.

That architecture is evolving at a very rapid rate. We think we have a solution that helps companies make that transition when you have to do ultra-low power, very quick, high-performance inferencing, even training potentially.

Lawrence: If I’m thinking about the stack of where you work, I like to think of it as starting with the transistor level, then the circuit level, then the cell which is lots of circuits put together, and upwards. Where exactly in that stack does Synthara’s solution operate?

Manu: We like to work at a level above the transistors. There are transistors, and transistors are assembled into what are called bit cells. Bit cells are the unit building block for a memory array. Then there are memory arrays, and these memory arrays are used in chips which also have processors and other things.

We start from just above the bit cell. We don’t design our own bit cell, but we may or may not design the memory array. We certainly include some compute around it. So we are changing that hierarchy a little bit. The customers themselves are designing their chips, so they’re not buying a new bit cell or anything.

A customer might say: I’m designing this new chip that I need for my wearable device, or I’m putting together this new AI inference chip, and I have this problem where I have some memory area, I have some compute, it’s too far apart, it’s costing me a lot of time and energy and area. Help me fix it. That’s where we step in.

Once we step in, we give them that solution. But we also deal at the software level. We say, look, now that you’ve put this in place, your software still needs to work. How exactly would your customers, who probably don’t know or don’t want to know what Compute RAM is, write their solutions to work on this? We actually provide that integration kit. We start from somewhere at the memory macro level, and then we work all the way up to provide solutions that ensure their customers are completely undisrupted. That’s key. That’s the platform. The platform is this new way of putting things together.

Lawrence: How does this fit into the new suite of chips people might be aware of? Groq being acquired or half-acquired by Nvidia. They have a different approach to AI inference. There’s Tenstorrent, there’s Etched, there’s Rebellions, all these companies offering AI chips. Help the audience understand how you fit into that.

Manu: Synthara’s stake is a computational memory solution. Today we are starting with an IP product, and all these companies you mentioned would be great customers for us. Some of them, companies like them, are people we are already working with. The thesis is all of these guys are looking to solve this problem amongst others to deal with efficient AI inference.

We help them get over that particular topic. Look, there is compute, there is memory, you diffuse it, and that’s usually some 70 to 80 percent of the chip area. We actually help make that quite a bit better, but then they still have to architect around it. If the use case is needing very quick response but not trying to do millions of inferences per second, that’s a very different architecture than one that says I don’t care how quickly a single token is processed, but I care about crossing thousands or tens of thousands of tokens per second. That’s a different architecture.

Our place is to enable all of them to solve the memory barrier issue. We give a solution that actually fits in both contexts. They both get to be better, but they can still differentiate at an architecture level depending on the use case they’re going after.

Lawrence: And it’s really agnostic to the hardware? The Groq architecture is very different. Tenstorrent is RISC-V. Groq is mainly SRAM. It doesn’t make a difference to what the memory is or how the architecture works?

Manu: There are some scenarios you can always come up with where it doesn’t fit. But the thing is, industry tends to standardize. The standards that are emerging are very much compatible with what we do, and that’s a conscious choice in our own design side too. If you think of microcontrollers, the ones that NXP or Infineon would produce, they tend to be similar. Likewise, GPU architectures available at Tenstorrent and even AMD chips have a certain set of architectural choices. Within that universe, we fit very well. Likewise, Groq has made choices in that universe that fit very well with us.

My claim is not that every possible chip in the world will be supported by Compute RAM no matter what they do. My claim is that the architectures that are emerging are really good candidates for using Compute RAM, and we usually complement what they do.

Lawrence: A lot of the thinking around how to offer customers something that Nvidia can’t normally falls back on the software stack, the so-called CUDA moat. The fact that every developer has locked into using CUDA. You mentioned earlier that you can integrate with CUDA. Is this just an API that redirects CUDA operations through your cell? How exactly is it compatible?

Manu: There are two ways to answer this. One is technical. The technical one is pretty much what you said. At the end of the day, all these operations are lowered into some kind of computational primitive. Depending on the abstraction, it could at the lowest level be just multiply-accumulate. It could be a dot product, it could be a matrix multiplication, it could be just a convolution call or a full decoder transformer layer. Depending on the API, you can integrate.

But I think the interesting thing is what are these APIs. We are not even inventing our own APIs. There are industry standards emerging with Nvidia and Microsoft and others participating, and we essentially hook into that ecosystem. Everything we do, at some point we expect to even contribute upstream. The thing we are doing here is to really help the community absorb Compute RAM into how they currently work. A PyTorch code or a TensorFlow code or whatever they put together should be lowerable into a Compute RAM-based system. It’s a problem that the industry has, and it’s not like we are solving it ourselves. We hook into that.

Lawrence: But Compute RAM would be proprietary, and you’ll sell IP blocks like Arm. Whereas if I were the industry, wouldn’t I want an equivalent of a RISC-V for Compute RAM? Wouldn’t I want a compute-in-memory solution that I don’t have to pay for?

Manu: It’s a bit like you write C code and that can be compiled to RISC-V or Arm. Our customers will write the C code, and as long as they respect some rules that the C Foundation has put together, it’ll work. Ours is the same. At some point you enter the proprietary realm of compute. But until that step, you don’t need to be tied to us.

If there is yet another engine that does the same primitives that we support, you are perfectly free to use that. My claim is it won’t be as efficient in area or energy as we can be, but we don’t lock them in or prevent them from looking for alternatives.

Lawrence: You mentioned area efficiency. Let’s get into that because people might be wondering what the actual numbers are. I understand the fundamentals that you put the memory and the logic closer together so you can go faster and more efficiently. But what are we talking about here? What’s the system-level energy efficiency you’ll get? What is the die size reduction?

Manu: When I answer this, I always like to think of the next best alternative, because that’s the easiest way to make an apples-to-apples comparison. We have an extreme diversity of use cases. One extreme are things like smart glasses and hearing devices. These are typically built on some kind of microcontroller-like platform. On these, we expect something like a hundred times or even greater improvement in energy efficiency.

Lawrence: Everyone loves big numbers, but what are we actually talking about? TOPS per watt? Could you give me the actual numbers?

Manu: What would consume perhaps some millijoules is reduced to microjoules, tens of microjoules, hundreds of microjoules. If it’s in wattage, you go from hundreds of milliwatts to sub-milliwatt, even potentially. In terms of inference time, let’s say you have a live audio stream and you’re doing some kind of complex noise cancellation. Now the battery life of the device can go from running for six or seven hours, which is typical, to perhaps a lot longer. A four times improvement in battery life is something that could happen. That’s the impact. It completely changes the product category and positioning as far as the customer is concerned in these use cases.

Lawrence: It’s intriguing because I imagine the next ten years will feature heterogeneous compute, where you might have a neuromorphic chip design for extreme low-latency decision making, or a photonic chip for latency or extreme throughput, various different configurations, analog chips, whatever it might be. A whole bunch of different designs all claim much better energy efficiency, a hundred times, a thousand times better. What you are saying is you are getting one or two orders of magnitude improvement without fundamentally changing the underlying chip design. Just update. You still get the same benefits without any change in the silicon?

Manu: Yes, but I think we are not as different from the others as your description might have it seem. Quantum computing we keep on the side, and photonic is mostly interconnect that we keep on the side. But all the analog and neuromorphic stuff and what we do are related. My PhD was on neuromorphic computing. I did all this spiking and analog stuff with resistive RAM and other things in my past life.

What we have done in Synthara is to distill all of those ideas and cast them in a format that is compatible with how the industry has so far operated, both from a process perspective as well as system architecture and software perspective. We have gotten rid of all the issues that I thought, and my co-founder thought, have restricted these technologies from being adopted at a large scale.

Some of these technologies are being adopted by companies for themselves. They create some flavor of in-memory compute or neuromorphic compute or analog compute, make a chip around it, try to go to market. Our take was: look, I can do that, but realistically, how many of those companies have a hope to succeed?

If you think about it, Nvidia, Intel, and AMD are probably the only ones who have a handful of products that they sell at scale. Almost every serious chip company has to sell so many variants and flavors that it actually is not viable to make a huge custom silicon product easily.

The market that we are targeting, the problem we’re solving with Compute RAM, is how does anybody who has this problem absorb this? For Synthara, the cleanest way to capture that huge value proposition is to provide a platform. Now it’s an IP product today, but we could do something else. There are ideas at play. But the core premise is this problem is persistent across different use cases, and these ideas from analog and in-memory compute are all relevant for them. How do we make it accessible for these companies who still want to keep everything they have done, because their customers at the end of the day buy it for that?

If you are buying a chip from NXP for an automotive use case, you expect all the automotive quality that NXP delivers, but you really just want it to be more energy efficient. Can I enable an NXP to do that? That is my pitch.

Lawrence: Not custom silicon. Your bet is slightly different. If we think about what’s happening in the industry right now, you have all the hyperscalers building their own custom silicon or their own AI inference chips, predominantly training chips and inference chips in Amazon’s case. All the hyperscalers plus say ten reasonably well-funded AI inference chip companies. Your claim is that’s all fine, but the truth is to be a successful chip company, you need to offer multiple chips, you need to be in generation five or six, and then you need to be offering multiple chips to serve different use cases. No chip company’s going to get there. So how do you get the same benefits without designing your own silicon?

Manu: At least how do we get to a stage where we can hope to get there in a sensible way? With 20 people sitting in Zurich and the stage we are in today, I am not going to make a custom silicon product. So the most effective way for me to get into the market and have that influence spread is this strategy, and that takes us there.

Now, we don’t want to make a new Intel server chip competition. That doesn’t make sense. It’s a use case that exists, and we need to look at new ways to get into the market. Our business model could evolve. But the core premise is that there is a transition happening in the industry. Compute and memory are coming together. There is a computational memory thesis that’s emerging, and we need to enable that. We can create a lot of value for the industry by doing it.

How do we monetize and at what stage is the question that is being answered by Compute RAM today. Today we are saying, look, there are customers making hearing devices desperate for energy efficiency. There are customers looking at data centers desperate for area and energy efficiency. Can we help them? Yes. And would they take the risk to adopt Compute RAM? Because it is a risk. It’s a small company sitting in Zurich. What if we change what we are doing?

The answer is yes, they are taking that risk because the reward is ridiculously large. And there are contractual and other ways to deal with the risk.

Lawrence: The alternate strategy is maybe the Fractile strategy, which is to do the silicon yourself, tape out the silicon, get it working, and then try and win a hyperscaler or win a customer with your own silicon and try and replace Nvidia, or now Groq, for some of these inference use cases. That strategy seems obviously much higher capex, much higher risk, but the payoff is greater. The actual business model of selling a product is more lucrative than selling IP. Is that a fair assessment of the choice you’ve made?

Manu: I don’t know if I agree with that. I agree on that side, so yes, if they get it to work, they can actually get a good payoff. No question there. But I am saying that payoff is accessible to us too. If a company is going to be acquired by Nvidia or whoever, it’s not like they’re buying it for the business. They’re buying it for the technology, the team, the concept, the architectural implications, all of it, which we have. Ours is a very clean, distilled thing that is good to go. It’s not corrupted by all the other weird decisions we had to make to go to market.

So that is still accessible. But in addition, I also have access to just being a clean IP business. Now I am also a product that could be used by NXP, could be used by Infineon, could be used by NSA, could be used by other AI competitors to the names you mentioned. Actually, some of our customers are looking at non-AI use cases. It’s DSP, it’s some performance DSP products. So we have a broader set of use cases to go after.

Our exit opportunities are to hyperscalers because they want to differentiate. Our exit possibilities are to IP providers like Synopsis, Cadence, and Arm. Our exit possibilities are to semiconductor companies like Analog Devices and TI who are also looking to solve these same problems.

We have been lucky that we are in Switzerland because we got this five to ten years to work on this problem in almost ideal conditions. We got a tremendous amount of grant and other funding. So we actually managed to spend all this time figuring out how to build this hard thing and assemble it. Why would a company whose main business is not to do in-memory compute spend the same amount of time and energy figuring all of this out? So we are there. If today Google says, okay, it’s too crazy to let these guys alone, they could buy us. Or they’ll say, no, it’s too expensive. But there are contractual ways to deal with the not-invented-here syndrome. We do tackle that in our contracts.

Lawrence: Let’s think a little bit about the market opportunity. You say you’ve been doing this for many years and you started in the neuromorphic space. That was your PhD. In Switzerland obviously, as Intel has the neuromorphic unit there. SynSense comes out of Switzerland as well. It’s always struck me that neuromorphic, analog in particular, those designs are best suited to DSP, extreme low-power use cases, which typically are at the edge, without battery ideally. So we could think of drones, but we could also think of glasses, watches, other use cases.

But actually I’ve seen in the last year to eighteen months almost all of those companies, Innatera being another one, probably SpiNNcloud, now focusing on the data center. Because the data center as a go-to-market has large buyers with much larger budgets and an urgent power consumption problem. How do you see those two markets? On the one hand the edge, which feels first-principles like it should be the appropriate market for what you’re building, versus where the demand is today in the data center.

Manu: There are two aspects at least that you mentioned that I probably have to respond to. One is the analog and neuromorphic being juxtaposed with what we do.

At the end of the day, it’s a chip design problem, right? The question is not if it’s analog, digital, neuromorphic, whatever. Most of the techniques that you see in neuromorphic have actually been done by chip designers in the past. The ideas are not radically new. It’s just maybe formulated and used in a certain way.

I would say it doesn’t need to be analog if it doesn’t need to be. Maybe I’ll say it this way: if it doesn’t need to be analog, the edge guys would prefer to not use analog. Even when I was at Analog Devices, there was a constant push against reducing the amount of analog on chip. You do more DSP, you reduce the analog. At some point, the argument was you still need something to talk to the world, but that’s the only thing that has to remain analog.

I don’t buy the argument that you need to be analog or neuromorphic to be energy efficient. That’s actually the whole reason behind the story of Synthara. I don’t think analog inherently makes anything more energy efficient. I think we have found ways to do quite well. I believe that our solution is perfectly well suited for the edge where people do analog and other things. And I think in some ways our strategy is much more efficient because even analog chips need to scale. Yes, they might be doing 65 or 45 nanometer processes, but they need to move. And analog is hard to take from one process to another. Analog is noisy. It’s very hard.

Lawrence: It’s not to say that analog or neuromorphic are better than what you are saying. More just that at an objective level, low power was originally an edge requirement, and actually the data center increasingly needs it too. As you and others move into the data center, every single chip design IP block is going after the same target. So if you are the TSMC of in-memory compute, you are getting calls from every single startup on earth. It makes you uber competitive.

Manu: Yeah, exactly. Just as you spoke, an idea stuck with me: the fact that analog has been sufficient is primarily an academic claim. An industry implementation that has validated that is not mainstream. I can’t think of one right now.

Lawrence: Is Mythic the only one that’s actually commercially deployed?

Manu: Yeah, but they’re completely rethinking what they’re doing, so I’m not sure what they’re doing now. The old approach did not particularly work with the flash memories and all that.

Coming to data centers, the need is clear now. The business has to follow where the demand is. Data centers need to be more energy efficient, tokens have to be cheaper. It’s a problem that has to be attacked and is being attacked from multiple angles, including software. Even if the chips were the same, just the effectiveness of the system architecture is dropping token cost at an exponential rate.

Coming to your question: yes, these guys should go there because there is money to be made. Will they actually make money is an interesting question. I think it would be very hard for the analog guys to get there just based on economics. Most of these data center chips are at this point essentially the size of a full reticle, and you want to stuff as much compute as you can into that reticle.

Lawrence: What is a reticle for those in the audience who might not have heard the word?

Manu: When you produce a chip, there is a manufacturing process, and the manufacturing process can draw chips at a certain size. It cannot get larger. There are some reasons for it to be in a certain shape. It’s physics, let’s just say geography. You have a certain number of chips, and you stuff as much compute as you can. The first-order requirement for any customer is: can I meet the demand? It doesn’t even matter how much it costs. I have people who want to do a million tokens per second. Can I serve that demand?

For that, you really need compute density. There is no way around it. Anything that improves compute density will be attractive. Now, once you have compute density, do I need a nuclear plant or can I run it on a regular power grid? So energy efficiency comes into play. It’s mostly a profit margin topic or maybe just a viability of the business topic.

These are the two key dimensions all guys operate on, and if we can solve that well, it’s great. An analog or neuromorphic approach that makes area efficiency lower is not great. But approaches where you say, look, I’m going to have a single bit cell that can somehow store eight bits of memory in a reliable way, that looks like it has some interesting possibility. I would mostly look at it from that perspective. When I look through those filters, some companies I find attractive, some I find a bit of a hype wagon.

Lawrence: Which do you find attractive without having to slag off the ones you don’t?

Manu: I really like what Cerebras is doing because it addresses that topic: there is so much demand for compute that it doesn’t matter how much you spend on the wafer, and if there is some yield issue, you can deal with it. Cerebras has a very interesting approach.

Lawrence: They do wafer-scale compute, where they literally stick two wafers together and try and fit as many chips on that wafer as humanly possible. They’ve done pretty successfully in terms of yield as I understand.

Manu: Yeah. Cerebras is actually an example of a customer that would be great for us. We would actually help them stuff a lot more compute on those wafers should they use approaches that we have.

TPUs are interesting. I think they identified the use case quite early on. It’s not a startup, but considering that they’re entering a new market, I think TPU has some interesting ecosystem play that can actually turn out very well. I don’t want to say other names because then it becomes problematic.

Lawrence: What’s interesting is that any of them in theory could be a customer. Many of them may consider themselves competitors in some sense because they’re selling custom silicon. You’ve got this really interesting frenemy situation in that you can make their products better, but you can also make their competitors’ products better. That’s an interesting strategic tension for you.

Manu: I agree. Direct competitors will find us threatening, and that creates some interesting opportunities for us. But I also see that even within a company, not to say we work or don’t work with them, a company that makes both data center chips as well as glasses, the team using Compute RAM for the glasses would use it very differently than the team using it in the data center. For us, even serving a leader or challenger in a specific market is actually a perfectly valid strategy. It still allows us to grow pretty big.

You can always architect around your competition because you still have a whole layer of architecture that you can work on to optimize for different things. All data center chips are not the same. Data center chips designed to only run Llama all the time are a very different architecture than ones that don’t know what LLM they would be running or if it would be an LLM. That architecture is very different. I don’t even know if all companies are necessarily even direct competitors all the time. I think we kind of found a nice space. We don’t threaten most people in my view.

Lawrence: Ignoring the IP versus silicon dichotomy for now, what is it that the industry, even just observers to the semiconductor industry, are getting wrong about the shape of how this is going to develop? One that springs to mind is the idea that bigger and bigger data centers, more and more power, nuclear power stations as you mentioned, is how someone will win in the AI race in the next decade. And obviously the counterpoint is actually eventually things will move to the edge, but we’re not there yet. Are there other things that you think the industry is getting wrong?

Manu: I think the interesting thing about what’s happening today is that people are wrong in magnitude. They’re not wrong in the directionality of things. It is true you need more and more compute, you need more and more power. But I don’t think the power is likely to be required in huge amounts because compute is expensive. Compute costs are dropping pretty rapidly. It’s likely because people are just going to be doing more things, so therefore you need a lot more compute just to keep up with that need.

The one part that I don’t have an opinion on is whether it is unsustainable. I can argue both ways because it’s a qualitative argument. You cannot really put numbers on it. Yes, you can argue that AI is mostly just going to be used for generating cats and dogs and just more Instagram video feeds, that it’s not really useful, it’s bad for the environment. But on the other hand, it could genuinely lead to quite a few efficiency improvements. If all of the cars in the world are self-driving and they’re constantly navigating, it can make transportation somewhat more efficient.

I think we are really talking in hyperbole here. But I can say industry as a whole will tend to optimize for profitability, and that is achieved by doing things more efficiently, not less efficiently. Therefore, I think just because of the incentive structure of the world, AI will end up becoming useful rather than harmful. It has to be. Otherwise, we’re just going to lose money.

Lawrence: So you said two things there. What I think is interesting in particular is this dichotomy: we have the GPUs and we just scale them up, and then we need to power them with ten-gigawatt nuclear power plants. Your note is yes, and we will need the ten gigawatts not just because the GPUs are inefficient, but rather because there’s going to be so much more computation. So the bet is you need both nuclear power plants and much more energy-efficient chips in order to serve the demand in a decade.

Manu: Yeah. That’s my expectation.

Lawrence: Very strong. Okay, to wrap this up, I think there’s a couple of interesting things I’ll take away. I think investors and just industry broadly underestimate the IP business model despite Arm. Arm is sort of an anomaly to many people. People tend not to like IP business models because you’re capping your revenue really, or your TAM. But I think what you said is a really interesting point, and you make it well: you can’t just make a new chip and beat the competition.

I mean, you could try, but not only is it extremely capex intensive, if you look at how long Cerebras has been going, how many years, what’s their percentage market share, what was Groq, what’s Tenstorrent, we’re talking sub one percent, right? And that might be a decade in. Your claim is that you can’t just do that. It’s not an option actually if you want a successful business. And in fact what you are doing with IP may be the only viable strategy for the market as it exists today.

Manu: Certainly for us. If I had the reputation of an Andrew Feldman or one of the other big shots, I could probably get 500 million and do it. But given where we are at, this is the most effective path for us.

Lawrence: But even then, right?

Manu: Yes. You can get it right once. You can fail. Once you kind of decide you’re going to put together this product, you cannot really test it in the market, and for you to be hugely successful, it somehow has to become the de facto option in that domain. That’s not easy or trivial.

It’s a risk-reward issue, but my take is broader. If you look at what drives value, what Arm is today, yes, it’s this big thing. But if you think of every transition in the industry, it has always been driven by one new creative concept. Qualcomm, I think, started off, it could be getting some part of the history wrong, but a good part of their value proposition was on the IP. IP is still a key part of that business model.

Lawrence: Synopsis is a huge part of the business.

Manu: Yeah. The story, if I remember it right, is they started doing this, then they said no, I need to go up the value chain because nobody else is able to understand this. They kind of kept stacking it up until they cornered the whole thing. Intel was also kind of IP-driven. There was nobody else making the chip. That’s why they decided to make their own stuff. That’s not exactly true, but I think you get the theme.

There is a change happening and there is an opportunity to drive that change. How you monetize it is almost a secondary thing. An investor in Synthara is investing because they think we can define this platform architecture meaningfully. Now, monetizing it has so many ways and mechanics. Yes, we start with IP, but we can do so many other things. There are things that we are cooking that is more than just being an IP vendor. Also, IP is a spectrum. There is IP and there is IP. It’s really not, in my view, the right filter to look at Synthara through.

Lawrence: Maybe the final point would be about in-memory compute. When you say there are different platform shifts and various innovations in the industry that have enabled new products, is the bet here on in-memory compute? Is that the change you see coming?

If I’m to try and frame that to the average person on the street, well actually they would be the wrong person to ask. Let’s say the average policy maker that may want to know the future of computing. I don’t think they’re thinking in-memory compute. I don’t think that’s really a word in their vocabulary. Should they know it? Should that be what we’re talking about as the innovation?

Manu: No, I don’t think people should think about in-memory computing at all. Again, depends on the abstraction, but a customer of ours is thinking: I have this problem with compute and memory, and I need something that kind of breaks it. In-memory computing or computational memory or whatever you call it is essentially solving that piece of the puzzle. If you can solve it in different ways, that’s fine too.

Lawrence: But no one’s saying I have a problem with my compute and memory. They’re not saying that. They’re saying I have an energy problem. I want to reduce power consumption.

Manu: Sure. The CEO says I have to reduce power consumption. Then they go to their architect who says, okay wait, but my power problem is because I have this compute-memory architecture that we have been doing for a few decades now. I need to change that. And that transition is what the industry is going through now.

Even if you think of the Apple chips, the unified memory is evidence that people are thinking about memory and compute differently. The fact that there are so many custom silicon projects is primarily driven by this problem: I cannot just do that old way of Intel-style chips, I need to break it apart and rebuild it. It’s all compute and memory.

That is the big change today. This whole heterogeneous compute thing, nobody likes heterogeneous compute. It’s cheaper to do homogeneous compute. You’re doing heterogeneous because there is no other way around it. And most of it has to do with just shuttling data back and forth. Optics, the biggest case for it is data movement.

I would say it’s data movement that is the theme of this era of compute. And in addition to Compute RAM, you still have to deal with the problem of talking to off-chip memories and so on. The strategies, the caching.

Lawrence: Yeah, that’s data movement, shuttling data around. My mind immediately thought: there are all the different levels of abstraction. If your core principle for designing an entire new computing system was stop moving data around, you would do as much as possible at the sensor, as much as possible at the edge. You would shuttle it back to the data center as little as possible. You’d try and do as much from the earbuds to the phone before you would go all the way back to the data center.

Manu: Yeah.

Lawrence: You’d do that at all levels of abstraction, all the way down to the chip level, which is where you are operating. And then as much as possible, don’t go off-chip to DRAM. You want as much on-chip SRAM as possible. It’s turtles all the way down as they say.

Manu: Yeah, exactly. Data movement. That’s the whole thing. All the PhDs, if you really distill it down, even a good part of the AI work on the compute side has to do with data movement. The thing that DeepSeek did was data movement optimization. That really primarily all of it is essentially to say they figured out a way to not move data around as much.

Lawrence: I wanted to end it, but I can’t leave that hanging. I don’t know what you mean. You’re going to have to explain what DeepSeek did exactly there for me.

Manu: The innovation around using old GPUs and how they managed to reduce the token cost. I don’t know if you remember, Nvidia shares tanked when DeepSeek made this announcement of far cheaper training.

Lawrence: Exactly. And I know it was cheaper, or in theory, they didn’t have the total cost of the training and the salaries and so on. But you say it was because of data movement. I don’t follow.

Manu: Yes, because the brute force transformer model would not reuse a fetched weight as much as it did with the DeepSeek strategy. This whole thing about mixture of experts kind of makes it so that when you fetch something, think about it like this. Let’s say it costs a hundred units to fetch something from outside the chip. Once you bring it on-chip, to do something on it costs you one unit of energy.

Now if you can fetch something and use it a thousand times internally, then that’s a hundred plus a thousand. That’s your cost. It’s only 1,100 units. But if you had to fetch that hundred-unit cost a thousand times, that’s now a hundred thousand unit cost. So you just reduce the cost of compute by about a hundred times.

A lot of innovation around how you do compute and architect your LLMs is being done primarily with this goal. And this is huge because these things are what enables edge computing pretty much as significantly as compute itself does. Being clever with software has always been the theme of computing.

Lawrence: The final point here, another way of thinking about this is the extent to which software and algorithmic improvements drive efficiency more so than any hardware change. We want to change the hardware as little as possible because we have a trillion-dollar machine churning out silicon at CMOS, at increasingly lower feature sizes. The more we can do in software, obviously the more programmable, updateable, flexible it is. Do as much as possible in software, which I guess is also what you’re doing. The vast majority of your day-to-day is the software side, so it maps.

Manu: Yeah, exactly. We spend a lot of energy trying to figure out everything in that story. We reduce data movement in general. Yes, we have Compute RAM, but our software deals with that problem too. We really like that Compute RAM, or whatever you call it, at the end of the day is optimizing data movement, and we are finding ways to enable our customers to benefit from that.

Lawrence: I’m not a marketer, and I don’t know what the front page of your website says, but “Stop Moving Data” feels like a nice tagline. It’ll certainly be the tagline for this conversation. I think that’s about right.

All right, thanks Manu. Appreciate your time today.


Debrief

So, what we all thinking? There’s something genuinely useful about Manu’s framing. The idea that data movement is the unifying constraint across the entire stack, from DeepSeek’s mixture of experts down to the bit cell, is the kind of synthesis that makes you see familiar problems differently. DeepSeek’s cost breakthrough, Apple’s unified memory, the explosion of custom silicon projects: these are all symptoms of the same underlying constraint. “Stop moving data.”

The broader point is that computational memory is no longer an academic curiosity. Samsung has been shipping HBM-PIM since 2021. SK Hynix has GDDR6-AiM. Google’s TPU architecture already reflects near-memory thinking, and each generation pushes compute closer to the data. I’d expect the next wave of hyperscaler silicon, TPU v6, Amazon’s Trainium 3, whatever Microsoft is cooking, to incorporate some flavour of in-memory or near-memory compute as a core architectural feature. The economics are too compelling. When your GPU spends 70% of its cycles waiting for weights, you don’t need a PhD to see where the optimisation opportunity lies.

The question is who “captures that value.” The Nvidia-Groq deal suggests the incumbents are paying attention. But there’s also a path for IP players who can offer computational memory as a licensable block, the way Arm offered CPU cores. Synthara is betting on that path. For a 20-person team in Zurich, custom silicon was never really an option, and the graveyard of well-funded AI chip startups suggests the odds aren’t great even with capital.

Europe has been searching for an AI chip champion, and the most likely path is not trying to out-Nvidia Nvidia. It’s finding a different game. Arm proved that an IP licensing model can define an entire computing era from a standing start. The semiconductor industry is going through another architectural transition, and at least this is a game European companies have won before.

Discussion about this video

User's avatar

Ready for more?