☎️ Interview: Dr Hyoduk Shin, Professor of Innovation at UC San Diego on the State of Privacy-Enhancing Technologies #003
On realigning incentives towards data sharing; why culture will be the most important driver; and why we will end up with a globally fragmented data economy
Collaborative computing is the next trillion-dollar market. We are at the beginning of fundamentally reshaping how data is used in the economy. When data can be shared internally and externally without barriers, the value of all data assets can be maximized for private and public value.
To explore this vision more deeply, I spoke with Dr Hyoduk Shin, an Associate Professor of Innovation Information Technology and Operations at the Rady School of Management at UC San Diego. Shin’s research interests include forecast information sharing and investment in supply chain management, competitive strategies under operational constraints, and the economics of information technology.
Highlights include:
How to realign incentives towards data sharing;
Why, besides technology, culture will be the most important driver;
How will we end up with a globally fragmented data economy
“If we look to the future, it is indeed possible that tech renders the regulation useless. Or at least, the technology enables firms to programmatically do what regulation intended. Users can then choose between firms in the market. "
Let’s start at the macro level, the dominant data strategy is obviously to hoard rather than collaborate. Some of this is just because firms lack tools to share, but a lot of it is just plain old business strategy: if data is valuable then you need to collect and keep more of it right? Is this changing, if so, why?
I do see that it’s getting more collaborative, yes. But at the same time, as more firms consider collaborative approaches it raises more issues that were, maybe, not fully recognised at first. Privacy is of course one that is top of mind. Privacy concerns and regulation is definitely a constraint to collaboration. Just keeping the data stored somewhere rather than figuring out how to share it is often the easiest thing to do. In many cases, even if a person or team really sees value in a particular dataset being shared with another team or pooling it with other datasets, the process of actually getting sign-off for it might not be worth the hassle for many people. That said, just storing the data somewhere is no longer necessarily the easy option. With more data being collected and stored, the value but also the risks of using the data haves increased. So you have all these questions related to data governance which companies are still trying to figure out.
The biggest challenge isn’t privacy however, it’s cultural or I guess more specifically about incentives. For companies there is not a strong enough incentive to share data. The interesting questions to think through are how to realign incentives to share. I think there are three main ways to do that:
Build up a long-term relationship between the data platform and the data supplier. Things like data unions and data trusts have a role to play here.
The other is to work on culture. Firms can create a non-exploitative data sharing environment. Leadership can encourage staff to find ways to get the maximum value from their data assets culturally, but maybe even as performance indicators.
Finally, you can use time as a way to segment data. Data is likely to be more valuable closer to when it was captured. So we could think about ways to share older data as routine practice but keep fresher data.
Right, so are there use cases where you have already seen data sharing happen and is there anything we can learn from those early adopters?
I can talk about two specific areas where we have seen tools used: the semiconductor industry and marketing, each with their own commercial challenges. My work has explored how suppliers can share information with vendors and customers. The semiconductor industry had a problem. Customers typically have to place soft orders in advance of committing to an order, so they have got very good at forecasting, but they are still forecasting. And because it’s a soft order, it’s rational for them to over-order. And that’s what we observed in the industry. The customer put in a soft order for 5k units and then actually ordered 3k units. So the vendor is left with inventory. This is an obvious problem for which some form of data sharing makes sense. It makes sense for the vendor to encourage customers to share real-time demand so they can match supply and demand more efficiently and cut costs.
Marketing is another use case. We have seen examples of retailers tracking customer demand and sharing that data with brands who then resell that on to hedge funds. That’s less an example of data collaboration but rather the creation of data products and the value of that ultimately to a customer. There are challenges with this supply chain in that the buyer at the retailer end has no idea how their transaction data is being used. But it speaks to the value of data and the opportunities if firms can package it up and sell it as products.
I’ve found financial services and healthcare to be relatively early adopters of data collaboration tools mainly because of the regulation around privacy and data security. Have you found the same and what other verticals do you expect to be the next adopters?
Confidentiality is the driver here. These industries have highly confidential information that they need to protect. In the case of financial services, there are huge financial gains to be had by sharing data for things like KYC and fraud. Healthcare is less about financial gains, although it is with the pharmaceutical industry, and more about the public health gains from aggregating data. A good example of organisations that attempt to strike a balance between confidentiality and aggregation benefits are trade associations. They collect confidential information from multiple parties and those parties enter into an agreement with the association, knowing they are providing some value in the expectation that they will get more back. So the model is not new, it’s just we don’t have the structures or organisations to share data yet.
When thinking about helping companies utilise their data, a sensible framework is: governance, sharing, and monetisation. It feels like 95% of companies investing in their data infrastructure are still on data governance, maybe 5% are finding ways to share internally, and <1% are even thinking about monetisation yet. Does this sound right to you?
Yes, so far it is rare to see much monetisation. Many companies are investing in governance as I mentioned earlier. This is sort of stage one before you can even think of sharing or monetisation. This is a known problem and there are lots of companies addressing this need so we will see it sort of solved relatively soon. How quickly firms then move to data sharing depends on a host of factors, yes technology matters and we are indeed lacking infrastructure, but culture will be the driver. Culture is typically the driver for adoption of new technologies in large organisations with inertia a particularly strong dynamic. But with data collaboration you have capability and talent issues as well as the need to develop, teach and learn new processes and workflows. I think we will see start-ups at the data sharing and monetisation stage long before larger organisations, for these cultural reasons.
It feels like the data consolidation model that has been at the forefront of data utilisation strategies has perhaps reached its limitations in terms of efficacy. With the emergence of “Data Mesh”, Collaborative Computing, and, more generally, customer centricity, do you see a horizon where a data federation model plays a more significant role in the lifecycle of data estates?
Honestly, it will take time. My feeling is that for many companies, the capability just isn’t there for any federation work. There is definitely a gap between leading thought-leadership and cutting-edge and market adoption when it comes to data collaboration. Large organisations will lack the capacity to do this. A cloud-based end-to-end solution that integrates nicely with existing software is probably the way we will see adoption, but as mentioned that only solves the tech, the cultural part is crucial too.
What cultural, technical or social change would be required for demand in data collaboration to increase 10/100x?
This is a policy question. I don’t see anything technically changing inertia. But regulation is good at forcing change as we have seen with GDPR. The difficulty is that it can’t be country by country, so we would want some agreed standard on data structure or data governance. Maybe done at G8 or OECD level, something like the WTO for data.
As countries are throwing up regulatory barriers to data storage and sharing, PETs like federated learning, fully homomorphic encryption and others are making it easier to process data without moving or even reading it. How do you think about a future regulatory landscape when data can be shared without being moved or read?
If we look to the future, it is indeed possible that tech renders the regulation useless. Or at least, the technology enables firms to programmatically do what regulation intended. Users can then choose between firms in the market. The reality is that different regions will use regulation for different purposes and so we will see different data outcomes. A likely scenario is regional variations with a different balance of power, so South East Asia designing regulation to empower state capacity, the EU individuals, and the US likely corporations. That is certainly too simplistic, but the point is that tech won’t render regulation useless.
And finally, if we play out our collaborative computing vision, what are some of the implications of a global marketplace of data? Do you imagine companies managing data as an asset and selling more or less to manage budgets for example?
Everything will be priced and therefore a financial asset. This is a challenge for lots of reasons. Turning everything into software and pricing everything, will have a tendency to monopoly and, even with data, there will be economies of scale. Tons of power will come from that. The marketplaces themselves will be uniquely powerful to determine what can be traded and if all data is a tradable asset you could have a censorship challenge even bigger than with social media firms today. We can speculate on interesting consequences of data assets, for example, we will need to adapt financial reporting to include data assets. There might need to be rules around how and when data can be sold as it relates to the financial year. We aren’t even beginning to think about the consequences of these sorts of things.
As a parting thought, if we think about putting data on a balance sheet, it’s not impossible to imagine the FAANGs increasing their value by 5-10x. If we think about valuations in terms of data, one of Apple, Microsoft, Amazon or Google might already be a $10 trillion dollar company.