☎️ Interview: Flavio Bergamaschi, Private AI and Analytics at Intel on the State of Privacy-Enhancing Technologies #002
On why integrity is just as important as confidentiality; why the importance of crypto-agility is overlooked, and the five groups you need to convince to sell data collaboration software
Collaborative computing is the next trillion-dollar market. We are at the beginning of fundamentally reshaping how data is used in the economy. When data can be shared internally and externally without barriers, the value of all data assets can be maximized for private and public value.
To explore this vision more deeply, I spoke with Flavio Bergamaschi, Director, Private AI and Analytics at Intel. Highlights include:
Why integrity is just as important as confidentiality
Why the importance of crypto-agility is overlooked
The five groups you need to convince to sell data collaboration software
"There is no general principle to say distributed computing is more secure or more private than centralised computing. It will depend on the configuration of the overall system, the requirements of the application, and the threat you are protecting against."
Your role at Intel is about turning R&D into products. What do you think about customer needs and the role privacy-enhancing technologies play?
We are almost like an interface working with the cryptographers at Intel and customers on the other. We get needs and requirements from customers and then feed that back into the R&D work. It’s a strong feedback loop, and we are careful to ensure the stuff being created solves customer needs. There are so many different customer needs for which privacy tools can play some role in addressing. There isn’t a killer app for PETs as such because PETs are so broad. Right now, we are seeing use cases where customers want to use privacy tools to make existing processes more private in some way. The real value, and we aren’t there yet, is when we think about the stuff that can’t be done today and can only be done with these tools. Things like data collaboration are probably somewhere in between in that we can imagine what the environment has to look like. Still, the end-to-end solution with solid privacy guarantees isn’t here yet.
What is the most common barrier you encounter when speaking to customers about how they can use privacy tools?
It’s the complex nature of how all the software and hardware interact to achieve privacy guarantees. In most cases, customers want to solve specific problems like sharing personal data from one database to another. Still, you have to understand the whole system to guarantee confidentiality. This is challenging primarily because vendors and customers aren’t used to thinking through their systems when buying a bit of software, even more so with software as a service and APIs connecting things together. So when you have to take a system approach, you have to work with more stakeholders across more departments, and it gets more complicated.
And that’s just the confidentiality side of things, obviously, for many use cases, we want to have integrity of computation, too, right?
Exactly yes, this is where the hardware side of things comes into it. We can use homomorphic encryption, for example, to secure the confidentiality of the data, but somebody can still corrupt the processing. This is where a trusted execution environment like Intel SGX can come in and combine with homomorphic encryption. For cloud-based computing, we are basically outsourcing work, and so we need assurances that the work is going to be performed as we asked.
For private analytics, then, and for our purposes, data collaboration, customers need to think about confidentiality and integrity. Still, there is a more prosaic challenge here, isn’t there: the management of cryptographic keys?
Yes, we can think about this as foundational to using cryptography securely in any organisation. The generation, distribution and management of cryptographic keys isn’t a trivial problem. This is sometimes called crypto-agility. This is when the security team has some sort of crypto inventory and is aware of all of the algorithms, keys, crypto libraries and protocols in use in their infrastructure and applications. Without sound foundations of key management, it’s challenging for customers to start using PETs and getting value from things like shared processing and data sharing.
Okay so customers are thinking about the basics of key management and then confidentiality and integrity of data processing; that seems like that would touch a lot of people’s roles in a large organisation? Who do you need to convince when selling data collaboration tools?
Well, as always, it depends on the particular product; the best thing to do is to go through the workflow and see who is responsible at each stage. Obviously, the key role here is Chief Information Security Officer (CISO) because we are talking about cryptography and security. CISOs manage risk, and even though so yes these tools have potential value-generation implications, but you won’t get anyway without CISO buy-in. Some firms will have Chief Data Officers (CDOs), and they are the people who will think more about the opportunities around data sharing and monetisation. Below the C-suite, you have five groups: data custodians, security groups, infrastructure, line-of-business, and R&D. Each of these teams will be involved in a decision around data collaboration so that it can be a hard sell. But, as mentioned, we are seeing more and more firms becoming crypto-agile and streamlining processes so they can innovate around data. Therefore we can expect the sales cycle to shorten as the number of stakeholders required is reduced.
Okay, big picture, what cultural, technical or social changes would be required for demand in data collaboration to increase 10/100x?
Assuming we continue to work on the performance side of things, like accelerators for homomorphic encryption, then the assumption is that with some change, millions more people will demand to use private analytics or AI compared to non-private versions. I think we are looking at two drivers: behavioural change and regulation. Users are already becoming privacy-conscious, so that’s not a new thing, but what’s been lacking are alternatives to switch to.
So what you're saying is that there is some latent demand for data collaboration tools, so when the tools are available, then you’ve got your 10/100x increase?
Yes, I think so. Much of this isn’t direct-to-consumer, it’s B2B, but now companies are using privacy tools to be socially responsible and have some PR gains. This seems to be ramping up.
As for regulation, maybe we haven’t seen it ripple through the industry yet, because the technology performance is lagging in privacy regulation. GDPR and the CCPA are just the first, we have privacy regulations worldwide, and many require firms to exhaust all technical solutions to protect privacy. So everyone is exploring all viable technologies in a way maybe they wouldn’t have done a few years ago. For many companies, it will be easier to roll out PETs in products and infrastructure to reduce risk.
When considering helping companies utilise their data, a practical framework is governance, sharing, and monetisation. It feels like 95% of companies investing in their data infrastructure are still on data governance, maybe 5% are finding ways to share internally, and <1% are considering monetisation. Does this sound right to you?
These three topics are topics of discussion today; how they split, though, is unclear. It’s evident that the governance bit is the bit that needs to be fixed first, and this is wrapped up with other efforts like entity resolution and master data records. Companies first have to organise their data assets before they can utilise them. There is a big gap between organising the data, permission it, and then sharing. Think of a data analyst tasked to find out some information, and they think some data they don’t have access to might be useful. First, they ask for permission. This might need a couple of authorisation levels, and they will have to detail exactly what they are using the data for. Even though, in this case, the data analyst might not know until they run the analysis. Even if they get access, there will be rules around when and how the data can be accessed, mainly if it contains sensitive or personal data. In some cases, there might be secure environments they have to work in or even air-gapped computers. There will be a lot of friction and a high likelihood that the data isn’t useful. Imagine that process playing out across thousands of companies every day. So data sharing, even internally, is tricky.
As mentioned, few organisations know they can address this problem with PETs. You could design a process in which a sample of data is provided to the data analyst in a privacy-enhanced way, just to test the analysis. The analyst can then test across as many datasets as they need to figure out which datasets they need permissions for. You can extend the process externally too using something like private set intersection. Something that is done rarely today because the legal costs are too burdensome.
And then the <1% monetisation, I don’t know. This feels like a different problem, one with bigger non-technical questions around ownership of data. Right now, collectors of data appear to have the right to package it up and sell it, but it feels like that is shifting somehow. But we are in the very early stages of this.
Finally, it feels like the data consolidation model is coming to an end. Do you think data federation is the future?
That is probably too simplistic. It will depend on the application requirements and constraints. Some use cases will require extremely low-latency and so it makes sense to do as much of the computing as close to the edge as possible. But others will still need data consolidation for efficiency and cost reasons. Certainly with federated learning we have tools that mean we can do more learning at the edges which has efficiency and security benefits, but federated learning is the frontier. We still have lots of organisations moving to the cloud and using data warehouses or lakes, and that trend isn’t going away. So maybe the question should be different: what do you want to protect when you want to process distributed data? Each node might have different policies for example, so it’s not the case that distributed processing is necessarily more secure or private. Any of these nodes, if we are talking about personal smartphones or edge devices, will have been configured by a highly variable range of expertise. There is no general principle to say distributed computing is more secure or more private than centralised computing. It will depend on the configuration of the overall system, the requirements of the application, and the threat you are protecting against.