Blockchain to enter $1Tn cloud market, once it solves data storage security problem (and here's how it will)
There’s new home for this article and it’s here https://crosshash.hashnode.dev/web3-secure-cloud-data-storage
Modern blockchains strive to replace clouds by offering developers an opportunity to build and host apps onchain. For example, decentralised social network http://distrikt.io/ is hosted completely onchain on the Internet Computer blockchain by Dfinity. It’s fast, it gives you mobile apps and web interface with decent user experience.
There’s one catch however: since it’s hosted on blockchain, node owners have direct read access to the backend.
Social networks, like Twitter, won’t give anyone direct read access to their databases - this would kill their business model. That’s why they host their backend on private servers, not in the cloud and definitely not on blockchain - this is not aligned with their objective to keep their biggest asset - the users’ data, secure.
But how can you possibly keep something secure, if it is hosted on thousands of blockchain computers worldwide, computers which neither belong to you, nor belong to your trusted partners. This is what blockchain backend looks like these days.
Well, obviously you might want to encrypt the data stored in the blockchain backend. And this perfectly works for closed business apps, where users may encrypt all their messages, files and videos, and keep the keys safely on their devices, where node owners can’t reach them. In this case node owners just store encrypted databases and files and serve clients’ requests and queries, without any access to unencrypted data (both symmetric and asymmetric encryption is possible here).
So what’s the problem?
There are actualy 3 types of data here:
First - public data, leaking which is not dangerous neither to your users, nor to your business.
Second - data, that your users don’t want the public to see, such as private messages, what they share in closed groups, authentication and financial data, contact details. This requires end-to-end encryption and secure storage. Nothing complicated here: this can simply be stored in encrypted databases onchain, and performing user queries over encrypted data is not a complicated task. Neither public, nor node owners can access data, but users can, because they own the keys (both symmetric and asymmetric encryption is possible here).
Third is tricky - it’s data, that is strategic to you and your business, but not necessarily private for users, for example:
users’ social graph, network and connections, friends and followers,
their interests, likes, ad preferences, political views, beliefs,
app usage patterns and device info,
their public posts and comments,
communities they belong to and events they attend.
Why the third type of data is strategic to you as an app developer / business owner? Because you don’t want anyone to parse, analyze, replicate it, and especially you don’t want anyone to clone/fork and run your app fully packed with real data.
To prevent troublemakers from doing so, you can also enrypt all the strategic data, but who in this case will hold the keys: you, users or nodes?
Case 1: you as an app developer / business owner hold the keys. In this case, the story is no longer decentralised. You become the authority here, you demand users to trust you, you may start blocking users or blocking info you don’t like, which defeats the whole purpose of switching from cloud to blockchain.
Case 2: users hold the keys to your strategic data. Bad choice for many reasons, but primarily because you as an app developer / business owner will need the direct read access to strategic data, for example to make queries for analytics and you can’t rely on users to be always online to provide you with the keys.
Case 3: nodes hold the keys to your strategic data. This may sound counterintuitive, but look like it’s the only choice here as long as we guarantee, that a particular node never holds the keys to the data it stores.
You as an app developer / business owner don’t have any capability to interfere in what your users do in the app, but still contribute to the app development and use the app to generate income for you.
Here we come to the main question of this paper: how do we ensure that node owners could store the encrypted database, store the keys, run queries, and still won’t be able to read it?
Now that’s a trillion dollar question. Alright, lets develop the solution to that. And lets do it so that anyone could do the same in their respective blockchains.
Basic principles we’re following while developing the solution:
All nodes are equal, there won’t be any priveledged nodes, no any kind of centralization
Databases and keys will be replicated several times in different locations to ensure geoavailability and recovery
A particular node can’t hold keys to decrypt data it stores, but can hold keys to decrypt data, stored in some other nodes
Nodes never store whole databases, tables, columns or rows, only parts of those
Every row in a database table can have its own key (public, private or symmetric)
Client can send query to any node, node is responsible for returning the result
Responsible node can split the query into parts and send it to other nodes for execution
Responsible node gathers the results of queries from nodes and gathers keys to decrypt the result from other nodes
All nodes know where the requested data is located and replicated, this metadata is spread throughout the network
Responsible node will be able to observe the unencrypted result of the query
Now let’s see how these principles are implemented:
Let’s create a backend for networking app with just one collection - Users, which has 3 fields (user_id, name, friends) and 1000 records:
Let our blockchain have 10 nodes and let us have 2 replicas of the database.
How would you solve it?
Step 1: For each row in the collection generate public and private keys, encrypt each row with public key, don’t encrypt index, add fields for private and public keys. Give the new collection a name UsersEncrypted:
Step 2: Split the collection by fields into 3 separate tables:
Here only the encryption table has private keys to decrypt each row.
Step 3: Further split each table by rows into 10 separate minitables. We now have total of 30 minitables with 100 rows in each:
Step 4: Create routing table, which maps user_id to minitable_id. This routing minitable will be stored at each of 10 blockchain nodes and will be used to resolve which node to request:
Step 5: Now the complicated part. Distribute the resulting 30 minitables among the 10 blockchain nodes so that every record is stored in two copies (remember, that we wanted 2 replicas of the database) and neither node has the private keys from tables they host. This starts to look like sudoku game:
Lets now test some query from the frontend:
From my mobile frontend app I request the node 6 to provide me friends of user_id = 350
. Node 6 is responsible for handling my request. It first needs to find out which nodes to request further. It looks in the routing table for minitable_id:
The minitable_id is 3. Then the requested node looks in the “sudoku“ table which nodes contain minitables friends_3 and encryption_3.
Let’s do this task by ourselves, because it’s important to understand. Go to “sudoku“ table from step 5 and find the nodes, which store minitable friends_3
. Check answer in the following table:
As you can see, the minitable friends_3
you can find either in node 0 or in node 9. Lets summarize which nodes contain data about user with user_id=350
:
Alright, now, to handle the query and return list of friends of user_id=350
, the responsible node makes the following requests:
After the node 6 receives the responses, it uses the private key to decrypt the list of friends and return it securely to my frontend mobile app. Mission accomplished.
The responsible node could only see the information needed to respond to a user query and nothing more.
This is simplistic view of the cloud-over-web3 we’re building at the moment. If you’re interested in colaboration let us know here https://airtable.com/shrvBOW4H6dcLzqFB