☎️ Interview: Christine Huang, Data Privacy & Protection at SAP on the State of Privacy-Enhancing Technologies #001
On why data-sharing is cultural; global data-sharing rules are unlikely; and a future tech stack won't be edge vs cloud
Collaborative computing is the next trillion-dollar market. We are at the beginning of fundamentally reshaping how data is used in the economy. When data can be shared internally and externally without barriers, the value of all data assets can be maximized for private and public value.
To explore the concept of Collaborative Computing in-depth, I spoke with Christine Huang, Data Privacy & Protection at SAP. Highlights include:
Why data sharing is cultural, so you want to encourage communication, collaboration and transparency
Why global data-sharing rules are unlikely because of different values
Why a future tech stack will be designed for business needs rather than a binary edge vs cloud dichotomy
"Once you have legal, compliance, tech and business aligned on using encryption-based data sharing tools, onboarding suppliers is the obvious next step. Once you’ve gone beyond the organisational boundary, you create a full data ecosystem."
How do you help customers inside and outside SAP make the most of their data assets?
It’s a journey that you go on with different people. The critical thing to remember is that every function inside an organisation has a different objective. Security folks see data assets as a way to make systems more secure. That’s how they will approach data assets. It’s not that they don't care or understand the importance of customer privacy or the value of data for analytics; it’s just not their primary concern. The same for the compliance team; they want to reduce risk and ensure processes around data are to the letter of the law. They want the analytics teams to have access to data, but within guide rails, they have set up. And well, the line of business folks want to access and use the data they need to make better decisions. None of this is inherently in conflict, but you can see how trade-offs will be made about using data assets.
Now, on a day-to-day basis, what this means is that data ends up siloed. Business folks do not want to tell compliance exactly how they use data. Not because they are doing anything wrong but because they just don’t want to jump through a load of hoops and slow the team down. Everyone has deadlines to meet, and the reality is orchestration and communication with all the relevant teams can be slow. So the inevitable outcome, in this case, is underutilised data assets. The systems aren’t as secure as they can be. The company is taking more compliance risks than it should. Everyone is worse off in theory. How do we solve this? Well, it’s easy to say and much harder to implement. You want clear business objectives rather than business unit objectives to be prioritised. Much of this is cultural, so you want to encourage communication, collaboration and transparency. You want clear lines of communication between teams operating with good faith intentions.
The term, data governance is often overused, and I get the impression it’s used as a catch-all for many different activities. What do you mean when you use the term?
Right precisely, there is a clear definition, but people have taken the term and extended it to mean lots of different things. There are maybe three ways of thinking about this: the difference in size, startups and corporates, and big and small companies. Small companies need to survive first and foremost and are building for a small group of customers to start. They don’t and shouldn’t be thinking about three years when they have more customers with more data and need to meet all these different rules and regulations. That’s not to say they shouldn’t be thinking about data governance; it makes sense to put processes around consent, provenance, et cetera to avoid technical debt. But the job is to move fast, sometimes that means cutting corners. On the other hand, big organisations with big legal, compliance, security, and data infrastructure teams are responsible for not cutting corners right. The big guys have internal policies and frameworks.
The second factor is industry. How strong data governance depends if you are in financial services or logistics, for example. There are different regulations, and the costs of poor data governance are higher in some industries like finance and healthcare. Maybe the industry is more important than size when it comes to data governance; a startup in financial services will be thinking more about data governance than maybe an SME in manufacturing.
The third factor is just industry but more granular, like what is the business unit. Or, to take it further, what is the objective of the business? In SAP, for example, Success Factors will have a very different data governance strategy to my multi-cloud team. Success Factors work in HR so that they will have different needs and objectives. I think this point is important because vendors can’t just come in as, say, help with data governance for the entire organisation, let alone an entire market. Data processes, culture and objectives are too different to address in one single solution.
What are your main challenges when it comes to data sharing internally?
Right, so yes, if we think specifically about making the most of data assets, usually the conversation turns to share the data internally. People may not even think that they are “making the most of the data” when they email a customer record to a colleague or share the status of a project in procurement over instant messaging. But that’s what we’re talking about: people constantly share information across the organisation. An odd thing here is that organisations might talk about data minimisation and carefully consider how they interact with third-party vendors. Still, internally, data is all over the place. So people know the challenges of moving data from the EU to the US, for example, and so there are many rules around that externally. But internally, how often do people think about where their colleagues are based and what can and can’t be shared? If it’s within the same organisation, there is no contractual relationship, so people think very differently about what can and can’t be shared.
When we think about finding ways to make data sharing more secure, confidential computing and all PET tools have an essential role. It used to be that you didn’t have to worry about data in use, only data in transit. How organisations are forced because of regulation to worry about third-party vendors and how they use the data you send them. So this is where you can imagine a handy tool that limits what third parties can do with the data you send them. That can be keeping it encrypted or verifying the integrity of the processing. Different problems will have different requirements, and again, the level of regulatory requirement matters here too.
Okay, so how big is the leap from tools making it easier to share data internally to then sharing it externally? The same cryptography that allows for internal data sharing can easily be opened up.
Yes, that sounds right. It isn't a large leap once you protect data in use for internal sharing; then it's easy to jump externally. Ultimately, it’s the same tools. The big challenge is getting organisational buy-in and changing processes. But once you have legal, compliance, tech and business aligned on using data-sharing tools that use encryption, bringing in suppliers is the obvious next step. Once you’ve gone beyond the organisational boundary to suppliers, you are really talking about an ecosystem rather than internal and external data, so you are in a different world.
How important is geography when thinking about the future of data sharing?
This is arguably the most essential point on data governance or data sharing. Data localisation is already one of the most important considerations for managing data assets. Governments like China, Russia and blocs like the EU all want data to sit where they have some control over it. Also, there are no global standards on encryption, for example, and countries have different export control requirements which will increasingly apply to data. So basically, you have countries that want to control data, but data isn’t static; it flows in and out of pipelines, so you have a very complex environment with unintended consequences. Then you have questions related to confidential computing, like, can you still move data to a cloud computing company in the US even if the data processor can’t read the data? What about federated learning techniques? Because you might never move the data from the endpoint, but the value is in the trained model, which sits somewhere else. So do we actually want model localisation?
There are many open questions here, and it’s certainly not a case that we will end up with global data sharing because people have different values. In China, people trust the Government, so they want the Government to be able to see data and protect them, for example. In the EU, there is a focus on the individual as the locus of control, so it’s about using technology to protect the freedoms of the individual. That;’s just two extreme examples, but every country will have their own culture and set of values that it will apply to data.
Right, so this vision outlined in the collaborative computer paper of these global liquid data markets is just not realistic?
Well, I see a pathway, yes, but it’s quite difficult to get there. I sort of have two major concerns, one is philosophical, and the other is practical. On the philosophical side, putting a price on data is problematic. Firstly you have the issue of determining who gets to sell the data. Who does it actually belong to, and who has the right to sell it? It works for some data where the owner collects it directly, but most data has a long trail, from who collected it to who moved it to the service providers. It’s not a technical problem, this is more of a legal one, and it’s hard, even putting aside the regional challenges. The second big problem is the rich get richer. What would stop the big companies from buying up all the data they can? What about nation-states? Companies and states go to great lengths to get hold of data today; if we have markets for them to buy it, it’s hard to imagine a world in which the market for data is unequal. You can imagine these markets' private features, making it hard to know who the buyers and sellers of this data are. We need legal frameworks to consider these issues before creating markets.
But let’s put aside the philosophical concerns; this is much harder than it seems because you would need globally agreed-upon standards. Standardisation is hard and takes a long time anyway, but standardising data for trade is too broad to standardise. It would be almost impossible to agree on a standard that supports all the needs of everyone today using data. So you would want to narrow down the requirements, and then as you do, you will find the big players today will want to bend the standards towards their needs. And the big players are incentivised to the status quo because they do pretty well from the data value chain as it is. So it’s hard to see the short-term or long-term pathway for data markets.
It feels like the data consolidation model that has been at the forefront of data utilisation strategies has perhaps reached its limitations in terms of efficacy. With the emergence of “Data Mesh” Collaborative Computing and, more generally, customer centricity, do you see a horizon where a data federation model plays a more significant role in the lifecycle of your data estate?
This is driven by business needs more than anything else. We want to do more stuff at the edge, so latency and performance matter more now than they did. Car and manufacturing companies want to process data locally and make decisions quickly. Sending data back and forth across the Internet takes time and, in many cases, isn’t necessary. But that isn’t true for all use cases. Many, if not most, use cases around data don’t need immediate action. Data can be sent back to a data warehouse for processing and analytics. We will probably end up talking about “edge computing” a lot. However, the reality will be that the vast majority of data will still be consolidated, and use cases that need to be done at the edge will be done at the edge.
So yes, I guess a data federation model will be more critical in the future, but as part of a model that uses a tech stack designed for business needs rather than a binary edge vs cloud dichotomy.