🔮E07: Decentralised AI
A top-3 large language model by the number of users will be a decentralised AI
This is a weekly newsletter about investing in deep tech. To receive State of the Future in your inbox each week, subscribe now:
Thanks for coming out tonight. You could've been anywhere in the world. But you're here with me. I appreciate that.
Congratulations to the two winners who won Jay Z’s 2001 classic, The Blueprint, last week. Fewer quotes this week (not zero quotes though). With more iteration on the format, I’ve moved to a shorter insight-type format. The previous summary style is a little too long, there’s a website for people wanting to go deeper. So for today, everything you need to know about one topic.
What you’ll learn:
Why we’re talking about Decentralised AI again? Hint: GPUs
Why might it be useful? Hint: Should the most powerful technology ever created in the hands of one company?
Is it an investable market? Hint: Yes, sort of, maybe.
Is private and decentralised AI a dangerous pathway to misaligned AI? Hint: Yes.
🔮 Decentralised AI
Jason Allen’s A.I.-generated work, “Théâtre D’opéra Spatial,” took first place in the digital category at the Colorado State Fair. Credit...via Jason Allen
It’s 2015; Uptown Funk is on the radio. Fifty Shades of Grey is on at the cinema. (apparently). And I’m at a Bitcoin Vietnam meet-up talking about the upcoming Ethereum launch; over a Cà phê sữa đá, I ask: “Could you “plug in” a machine learning algorithm into these smart contracts? After a pause, “Sure, in theory, yes, but not sure it would work today.” Fair enough. I’m just a politics graduate; what do I know? I grab a Banh Mi and jump on the bike.
Fast forward. It’s 2017 at Outlier Ventures; the future was ahead of us. We were bright young things trying to build a decentralised future. Are these things called crypto-equity or crypto-tokens? Why *exactly* are these things someone created yesterday worth billions? It doesn’t matter; move on, ZIRP, ZIRP, ZIRP. The Attention Is All You Need paper was month’s away from being published. And even when it was I probably wouldn’t have come across it.
In this intoxicating maelstrom of excitement and ennui, I sit down and write how crypto and blockchains will “converge”. I also wrote how IoT, 3D printing and all the things would converge. But I mean, things were different back then. It was the everything bubble. Don’t judge me. Anyway, I wrote: (heavily edited because my writing in 2017 is unreadable):
“As AI is deployed, the thirst for data will increase. Data is the way machine learning models learn, especially deep learning algorithms. As models get larger they need more training data. Blockchains could provide the largest repository of open-source validated records. As more digital assets like land registries, property records, and tickets are stored and traded, blockchains could become the essentially the largest open-access database in the world. Blockchains, at least permissionless blockchains, reduce the cost of entry into the AI space allowing start-ups to have access to vast amounts of data instantly.
The structure of blockchains can allow for individuals to own their data and sell or rent access to this data to AI services in a marketplace. Instead of providing data for free to companies, individuals can get paid. A blockchain-based data marketplace provides a way to share and monetize data that would otherwise be wasted or given away for free. New business models can be created so that data providers can rent their data for a specific experiment, or time period, or even based on outcomes.
The existential threat to AI companies is a shift in how users value their data. The majority of consumers will trade-off privacy if the value of AI is an early disease diagnosis or personalised learning. But they do care about money, and the commercial model is what will drive blockchain-based AI and data marketplaces.”
At Outlier, we bet on this idea. Fetch.ai to build a framework for autonomous economics agents. And Ocean Protocol to build tools to create data marketplaces for AI. We were off to the races. The Future was here.
I realize five years went by; I'm older. Memories smoulder, winters colder. But that same piano loops over and over and over. The idea of smashing crypto and AI together is now called “decentralised AI”. Ideas have a time and a place. Is 2023 the time and the place?
Let’s find out.
Let’s start by orientating ourselves:
“Centralised” computing refers to a system controlled and owned by a single entity. The architecture is typically client-server but can also be edge or peer-to-peer. To serve billions of people worldwide, computations are generally distributed; in a distributed computing model, tasks are spread across multiple nodes in a network to solve a problem concurrently. This model is the dominant computing paradigm today.
“Decentralised” computing is a system of computing that lacks a central authority. Each node operates autonomously, making decisions based on its information and that from its peers, enhancing system resilience. Blockchain is a notable example, operating without central control, ensuring data consistency via consensus. This was the model behind P2P applications like Napster, and now Bitcoin, and crypto.
Centralised and decentralised models can utilise privacy-enhancing technologies (PETs) to provide some data integrity or data confidentiality features. So-called PETs include federated learning, which trains models across multiple decentralized devices without sharing data; secure multi-party computation, which allows multiple parties to compute a function while keeping inputs private; and homomorphic encryption, which performs calculations on encrypted data without decryption. We can think of PETs as a privacy layer for centralised AI (also knowm as privacy-preserving machine learning (PPML)) and decentralised AI.
Decentralised AI is an application of decentralised computing, which distributes machine learning computations across multiple independent nodes rather than centralizing them in a singularly controlled and owned cloud. Most AI systems today are deployed on centralised servers due to computational needs and data requirements. However, this raises data privacy concerns, scalability issues, and risks of single failure points. Decentralised AI is an approach to AI in which no single entity owns or controls the network or machine learning models.
Is Decentralised AI overrated or underrated?
Underrated. In 2015 I was right directionally but got the timing wrong. I will double down: now is the right time. Because of sentiment + some/a lot of criminal behaviour, crypto’s reputation is tarnished. But decentralised AI, at the very least, offers an alternative way to build AI systems. Even if you don’t think decentralised can compete with centralised systems, the approach is differentiated enough, and the opportunity size is large enough to warrant venture capital.
Viability: how technically mature is the technology? (4)
Decentralised AI is at R&D stage with few commercial products. Numerous decentralised computing networks are up and running, like Golem and iExec. These networks can run any type of computation, from rendering CGI images to running complex scientific simulations or processing large datasets. These networks “work” from a technical perspective, but the commercial proposition is relatively weak. Relative to cloud computing services from AWS or Azure, decentralised compute is often more expensive particulary as the scale and frequency of tasks increase. It many cases it delivers lower performance, higher latency, greater complexity and less reliability. Decentralised AI networks, optimised for specific algorithms and tasks, may offer a stronger proposition in terms of performance and cost, especially for task division, resource matching and validation, but will need to overcome latency, complexity and reliability issues.
Drivers: how powerful are adoption forces? (5)
We are talking about decentralised AI (again?) because of GPT4, ChatGPT, and LLMs. There are three main drivers: the GPU crunch, the maturation of privacy-enhancing technologies and the AI monopoly concern. On the GPU side, the unprecedented demand for LLM has overwhelmed GPU supply leading to shortages, rationing and higher costs. This GPU supply crunch leads to a search for more GPU power wherever it can be found. Privacy-enhancing technologies continue to mature, especially zero-knowledge proofs, multi-party computation and fully homomorphic encryption. Mature PETs mitigate data security and privacy concerns when outsourcing computation to random nodes in a network. Finally, and not to be underestimated in that, crypto has always been an anti-establishment story, and the underlying decentralised systems are a bulwark against state or corporate monopoly. With the breakthrough of AI and the concern that the technology is too powerful to be controlled by a single company, the conversation about decentralised AI has taken a more serious turn.
Novelty: how much better relative to alternatives? (3)
Decentralised AI competes against centralised AI in its many forms. Centralised AI can be a single server, a distributed computing cluster, or PPML. Many centralised AI designs and architectures exist, and all have a single entity that owns and controls the model. The decentralised AI value proposition is threefold: no single controller, efficient economic activity, and greater data access. I am assuming PETs are used by centralised and decentralised AI and will not be a differentating factor.
Centralised AI has an advantage regarding speed, bandwidth, latency, cost, and scalability. These advantages will be insurmountable in the short term, and we shouldn't expect decentralised AI to outcompete OpenAI. But we can expect the voices saying that a single corporate entity cannot control AI to get louder. This will likely result in the use of non-profit type structures, but it will be a tailwind for decentralised AI (and speculatively DAOs, too). In the medium term, the advantages of decentralised AI will become more compelling. With blockchain-based data marketplaces and native payment rails for resources, decentralised AI may have a data advantage as there is little public data left to scrape for centralised AI. More data will be available on blockchains, notably validated data, and decentralised AI networks will use cryptocurrency to buy and sell data, tools, models and anything else required to perform increasingly sophisticated ML tasks.
Diffusion: how easily will it be adopted? (3)
Performance and scale will driving the AI market. Any additional roadblocks to scale, like decentralisation and privacy, will be considered unnecessary by most developers. PPML has a fighting chance because the performance trade-off may not be too large for the benefit of accessing more training data. This will at first be true of zero-knowledge proofs especially with bespoke hardware acceleration. But decentralised AI and especially decentralised and private AI will be materially worse across most dimensions that matter in the short term. There will be some applications, most notably in crypto, where some customers will trade off performance for decentralisation and/or privacy. Adoption may be complicated by regulation which will unlikely be nuanced enough to differentiate between decentralised systems and crypto. In a more regulated environment, providers of decentralised AI networks may have to apply for licenses to operate or some other suitably burdensome permit to operate.
Impact: how much value is created? (4)
The decentralised AI value proposition is 1) no single controller and 2) native economic rails and 3) access to valuable proprietary data in the future. The size of the impact rests on the probability assigned to the value of these three points.
The high-impact scenario is 1) Becomes politically untenable for AI to be owned and controlled by a few companies. Politicians break up AI companies benefitting from decentralised AI protocols. 2) Decentralised AI has unique capabilities that centralised AI cannot copy because of native crypto rails, including faster and more programmable payments and more complex automation and workflows using smart contracts. And 3) blockchain-based data markets have become a huge source of proprietary data, and LLM continues to demand new real-world data sources.
The low-impact scenario is 1) regulation and tax policies allow for a reasonable compromise on distributing gains. 2) CBDCs/regulated asset-backed stablecoins and incremental improvement in current financial rails reduce the benefits of using crypto rails. 3) Blockchain-based data markets don’t have proprietary data or/and future LLMs predominately use synthetic data as training data.
It’s a little from column A and a little from column B. I think that the synthetic data generation for future training data is a very plausible outcome meaning access to proprietary data isn’t that important for LLM operators. But I believe crypto rails offer a significant advantage in digital resource allocation that outcompetes centralised AI solutions, at least for certain types of more complex automation-type tasks. Tasks for example with multiple steps and where paying for access or resources is a useful feature to successfully complete a tasks.
Timing: when will the market be suitable for risk capital? (2020-2025)
With LLMs, it’s hard to argue to wait until 2025-2030. Who knows what the market will look like in another 18 months? Just as a function of LLM demand and the GPU crunch, I would argue now is the time for decentralised AI to get to market. This is a case of the demand being here before the supply is ready, especially on the PET side. Nevertheless, a proposition that brings more GPUs to market will find a customer. I think the right product here is decentralised AI first, and then over time incorporate PETs adding data integrity and confidentiality features to end up at a decentralised+private AI solution.
Open Questions
What is the raw computational overhead associated with decentralised orchestration? It will never be as efficient as a centralised solution, so what is the commercially viable trade-off than intersects with a large enough market? Is 10x slower/less powerful viable? Is 5x or 2x?
Are “open-source” LLMs a good enough bulwark to monopoly control? Is the ability to fork a codebase and run your own model on your own hardware sufficient mitigation? (see Alpaca)
How many algorithms can a decentralised ML network run efficiently? Will it be like an ASIC in that the network is highly optimised to a specific algorithm to be competitive with centralised AI solutions? Or will a network be more general, like a GPU efficiently running matrix multiplication? The answer to this determines the market size.
How important is privacy? Our mental model at Lunar is that decentralised AI only works with sufficient privacy guarantees. It’s not apparent if the answer is FHE, ZK-ML, MPC or some combination, but all the data can’t in processed by random nodes in the clear, can it?
Does adding privacy protocols make solutions networks commercially unviable? It’s hard enough to get decentralised AI to work in the first place at, say 5x(!?) overhead. Add on some privacy-enhancing tech, and then it’s up to 10x(?!) overhead? Is 5x commercially viable and 10x not?
Something something something autonomous agents (substrate: the Internet) versus autonomous economic agents (substrate: blockchains)? Are these different “intelligences”? Intelligence is environment dependant so would these intelligences “evolve” differently?
Is private and decentralised AI the way to harmful AI? An AI that controls a wallet without anyone knowing it’s an AI consuming and accumulating resources, and it can never be turned off? It sounds bad, no?
10 Startups to watch
Zama (FHE-ML) (Lunar portfolio)
Modulus Labs (ZK-ML)
Nevermined (I’m an angel investor)
Ocean Protocol (previous investment with Outlier)
Fetch.ai (previous investment with Outlier)
2030 Prediction
A top-3 large language model by the number of users will be a decentralised AI
For those interested in the state of the art, there is a decentralized AI hackathon in Paris for EthCC, https://www.augmenthack.xyz.
Thank you, goodnight.