Data is the core of the 21st century economy – the digital economy. It is the fuel on which the internet runs. At least 2.5 quintillion bytes – 2.5 million terabytes – of data are being produced every day1 (that’s 2.5 followed by a staggering 18 zeros!). Companies worth trillions of dollars dominating the stock market have made the extraction, storage, and analysis of data their bread and butter. However, there’s a novice movement gaining traction, seeking to empower data owners and establishing a more egalitarian data economy. In this article, we will provide a brief intro to this infant industry.
Each user of the internet and apps conjures up an online identity, a digital avatar. And unbeknownst to more than one might think, data aggregation has reached terrifying extents. From tracking your wake-up time, to knowing your favorite breakfast recipe, to knowing the location of your children’s school where you drop them off each morning to increasingly intimate matters – your current and former relationships, your favorite brand, or your cousin’s pregnancy status. There is hardly any area the digital realm does not span2.
At the web’s current stage, unfortunately, these vast amounts of data are siloed on centralized servers and harvested by very few, for a couple of reasons. For starters, as data can be used more efficiently in larger quantities, data monopolies have emerged. The more information Facebook and Instagram have about your search behavior, the more precisely they can adjust their content to your likes, which will likely prompt you to keep using their service in the future as well. Same goes for Google’s search engine, controlling over 90% of the global search engine market. Every search request gives Google more information to improve its search algorithm. Google has completed so many more searches than any other search engine that it has gained a significant advantage over competitors in terms of understanding what customers want. It’s a self-reinforcing cycle capitalizing on a lock-in effect. Once inside an ecosystem, only few leave it again.
Actually, however, the data stored in a centralized manner by a single party does only have very limited value. It is locked within one party’s ecosystem, and therefore cannot be used for other productive causes. Instead, wouldn’t it be way more sensible if users could determine independently what data they are willing to share, and equally important, for what particular purpose? This is a key question worth diving deeper into, but first let’s rewind and take a look at the different stages of the web, as the recent development in that area is a key driver of the “dataverse”.
The web’s evolution
Figure 2: How the data aggregation business becomes increasingly powerful over time
Web 1 (1989-2005)
Web 1.0 refers to the initial stage of the world wide web. It was coined by very few content creators and many content consumers. Web pages were static, and personal web pages were common as well. In comparison to today, there was hardly any interaction between the website’s host and its visitors – the mere purpose was the displaying of information.
Web 2.0 is all about user engagement and interaction. User-generated content has come to the spotlight, social media is booming. Content becomes increasingly dynamic and is responsive to the user’s behavior, and way more personal opinions are being shared and demanded – from podcasts to blogs, videos, and crowdsourcing3. Complex algorithms analyzing and predicting users’ behavior have become a driving force. Middlemen rise to the top and become unexpendable service providers, aggregating vast quantities of user data – be it social media (Facebook), searching the web (Google), or making online payments (Paypal).
Web 3.0 describes a movement striving to rebuild the internet’s infrastructure in a decentralized way using distributed ledger technology (DLT), primarily blockchains. DLT, in a nutshell, is a shared, immutable, transparent ledger that stores data. In most cases, this is done in a permissionless way, meaning anyone can become part of and access it by operating a so-called node. It started with the invention of Bitcoin in 2008 – an open-source, decentralized global payment system, and has ever since turned into an ideological movement disrupting areas such as finance, art, or data storage. Web3 seeks to give back ownership to the users and disintermediate current structures.
Figure 3: Brief overview of the different stages of the internet
Some key movements within the Web3 space are DeFi, NFTs, the Metaverse, and DAOs. Being a foundation for many use cases, it’s vital to have a decent understanding of each one of these terms.
DeFi revolves around the idea to provide various kinds of financial services on the blockchain, in a permissionless, trustless, and decentralized way, by leveraging the blockchain technology and its inherent benefits.
NFT is short for non-fungible token – a unique token stored on the blockchain, that can represent digital and real world assets (RWAs) – art, real estate, tickets, or in-game items. They are essential for the metaverse, a digital avatar-based universe where users can pursue activities together as they would in the real world, and claim full ownership of any items in their possession as proved by the blockchain.
DAOs, short for decentralized autonomous organizations, are set to disrupt the way organizations are structured. Instead of relying on hierarchical order, where ultimately, very few are in a position of power by representing and commanding all below them, DAOs aim to make the decision-making process fully egalitarian through member-owned communities with equal voting rights, and without centralized leadership.
Why we should reclaim ownership of our data
The rise of blockchain technology and decentralization has prompted many to critically question how big tech is harnessing our data. Why should they make billions using data that we as users generate? Sure, we all know sayings such as “If it’s free, you are the product” or “there ain’t no free lunch”, but still, the question needs to be posed: Is there a way for us to use services and take full ownership of the traces we leave while doing so?
In order for that to happen, we need to focus on recreating the data economy. We need marketplaces and simple ways to filter, monetize, and make use of our data in tailored ways. We need a world where people own the platforms they are using, earn money fairly for content they are generating, and are in full control of their data rather than held hostage by it.
This will start by users claiming more rights over their data, and ultimately lead to a world where data will be seamlessly traded, monetized, and utilized, most likely on widely used exchanges and marketplaces. Ultimately, data is likely to emerge as its own asset class. If you have deep expertise in the AI feld, why not invest in a certain data chunk offered on a marketplace that you think is ideal for training state-of-the-art algorithms? As bizarre as this sounds, this is what we are heading towards. It goes without saying that such a novice way of monetizing and using data is only possible given certain encryption and anonymization standards, as privacy still is the highest of all goods.
Figure 4: This is a simplified model for a new data economy, where the data producers are the data owners.
Let’s look at some of the impacts a user-owned dataverse will have.
1) Data Access & Ownership
Data is fairly easy to replicate. Through tokenizing access to data, in combination with the transparent nature of the blockchain, we will be able to track ownership and usage of data on open markets4.
Using a DAO structure, data publishers will have a say on the future development of the community, marketplace, guidelines, customer selection etc.
3) Data monetization
At the internet’s current stage, with tech titans that emerged in the Web2 environment, there is no possibility for users to systematically monetize their data (actually, there is no real way at all). Decentralized data marketplaces and unions are currently trying to change that by allowing individuals to sell selected data of theirs, which, after proper anonymization, can then be used to train algorithms, come up with marketing personas and so on.
4) Data confluence
The algorithms will be trained using a wider array of data, gathered from various services, and not only a few selected. This variety will ultimately help sharpen the algorithms for real-world applications.
Some real world examples
Lastly, it makes sense to provide an overview of some of the major platforms and communities existing today that seek to tackle the aforementioned issues, majorly within the data storage space.
Ocean Protocol: The Ocean Protocol has the potential to be at the forefront of the new data economy, offering highly promising solutions for individuals to make use of their aggregated data. It is a platform and marketplace that facilitates the decentralized provision and exchange of data, utilizing advanced blockchain technology that enables data to be shared and sold for AI training purposes in a secure, safe, and transparent manner. This is why we at WeDataNation leverage the infrastructure of the OceanProtocol, enabling individuals to participate in this paradigm shift and the opportunities it creates. Learn more.
IPFS & Filecoin: Filecoin is a peer-to-peer network providing decentralized storage space. These storage capacities are hosted by storage providers – computers that are responsible for storing files and proving they have stored the files correctly over time. Being a permissionless network, anyone able to provide the necessary hardware and computing requirements can join the network and offer their storage space. Filecoin facilitates open markets for storing and retrieving files, and setting market prices for the storage. The native Filecoin cryptocurrency works as an incentive layer on top of the IPFS network (Interplanetary File System). Storage providers earn FIL for storing files, and the blockchain records all transactions as well as proof that the files are stored properly5.
Storj: Storj offers crypto-powered cloud storage as well. Storj allows any computer running its software to rent free hard drive capacity to users looking to store data. And again, instead of being stored on centralized servers, the files are being encrypted, split into pieces, and distributed on a global network run by many nodes. Upon request of a file, it is recompiled securely and made available for download. The Stroj token is an ERC-20 token6.
Akash Network: Akash network is open-source cloud computing platform for developers enabling access to cloud computing services similar to AWS, Google Cloud, or MS Azure. Its mission is to help deploy decentralized apps as easily as possible. The network provides the computing power by offering data centers to lease their spare computational resources for hosting Docker containers7 (which, in very simple terms, is just a software package that makes it fairly convenient to run an application). The Akash token is an ERC-20 token as well.
The dataverse is just getting started
This article was meant to provide a brief introduction on the emerging decentralized data economy and generate some excitement. We will dive deeper into specific topics in the weeks and months ahead. The niche is still in its infancy, and the core infrastructure is just being built as you read this. At the dataverse’s core is the pursuit of independence, ownership, and privacy, paving the way for the further evolution of Web3 – the people’s web.
WeDataNation’s mission is to enable individuals to unfold the true value and potential of the data they generate. The world’s first AI-generated avatar based on each user’s unique dataset of interests, behaviors, interactions, and more is just one of the many visualization features we offer. With WeDataNation, users will finally receive the tools to easily analyze their data and generate passive income.