Web3: The web reinvented

Salim Haniff
8 min readDec 21, 2021

--

Introduction

Web technologies are constantly reinventing themselves. If you look at the history of technologies starting from 2000, we can see the path that web technology has evolved. We first had physical servers hosted in data centres, and then servers eventually migrated to virtualization technologies. Virtualization of servers led to cost-saving measures. In addition, virtualization allowed organizations to use software to design how to provision data centres. Provisioning could now be done through software or Infrastructure-as-Code (IaC). Creating virtual servers allowed organizations to stand up servers to implement their solutions quickly.

Eventually, the popularity of public clouds became a norm for many organizations. That allowed organizations to promptly stand-up servers without the need of racking them in data centres. That allowed organizations to focus on software solutions rather than underlying hardware technicalities. This shift has allowed developers to provide software solutions to address customer needs. Speed of innovation has now increased with cloud technologies combined with DevOps practices. Companies can now explore and test out new ideas quickly. One technology that has quickly adapted to these recent trends is Web 3.

Web 3 technologies emphasize developing decentralized solutions, contrasting to Web 2, which was geared towards more centralized solutions. For example, with Web 2, most SaaS applications used by the users authenticate to a centralized authority, upload their data to a specific server and have a centralized body govern the management of the uploaded data. On the contrary, Web 3 aims to empower the user by controlling who and how their data is accessed, essentially removing centralized governance bodies. However, the change from centralized to decentralized does change the user’s experience in how they interact with decentralized software solutions.

We have already seen examples of decentralized technologies in the past. BitTorrent and peer-to-peer video conferencing applications using the multicast protocol are 2 popular examples. Web 3 ushers in new technologies like Blockchain, cryptocurrency, DApps and interplanetary file system (IPFS). As mentioned earlier, incorporating these decentralized technologies require a rethink of the traditional software architecture utilized in Web 2. For example, we now need another authenticating and authorizing system without centralized authentication systems to allow the user to access the system. This article will describe how Web3 is changing how we authenticate to the environment, store data utilizing Web3 technologies, and communicate in a decentralized ecosystem.

Authentication / Identity

One keynote is that Web 3 is not necessarily designing new systems. Instead, Web 3 takes proven technologies from the past and adopts them. Readers familiar with the underlines of SSH or OpenGPG will understand the concept of private/public keys. The use of a private/public key helps create a cryptography identity of an entity. The entity could be a user or an application agent. When the user or agent needs to authenticate with another system, they exchange keys that are validated and verified. Additionally, digital communication between two parties utilizes the keys to encrypt, sign and verify the messages passed securely between the parties.

The use of private/public keys for identity in Web 3 is seen throughout many application domains. Most popular has been the financial domain (DeFi), where users create a crypto wallet to house their digital currency. The creation process of the wallet involves a series of steps where the outcome is a generated private/public key used for signing and creating an identity in the ecosystem. Once the keys are made, the user can send or accept cryptocurrency from any user in the ecosystem.

Developers can quickly onboard identity management in Web3 through software libraries designed for Web3, such as Web3.js and Ether.js, which offer methods to quickly integrate identity management in applications that can talk to Ethereum-based blockchains. A web browser plugin known as Metamask is another solution to help manage identity and the user’s private/public keys. Metamask provides hooks between the web application and the Ethereum network. Web applications typical interact with Metamask via the Web3.js or Ether.js libraries. Metamask helps overcome the UX issues where users are no longer burdened with figuring out the connection to the Ethereum network.

In the decentralized concept, the user is now responsible for protecting and backing up their private/public keys. In the past, the centralized authority would have to ensure that they secure the user’s authentication and provide strong authorization mechanisms within their Identity Access Management (IAM) service. The centralized authentication mechanism typically forces the users to prove their identity using a basic username and password combined with modern multi-factor authentication. When the user successfully authenticates to the system, the system would pass some information in a data structure back to the user to pass onto other services. The standard data structure in modern applications is the JSON Web Token (JWT). Web3 and decentralized changes this by always signing and verifying these data structures through public/private keys and third-party operators like miners. In later blog posts, we will dive deeper into authentication and message signing used in Web3.

Data Management

With the removal of a centralized authority being the sole keeper of data, the concept of data management within the Web3 context will require rethinking. Interestingly, the main goals of data management in the Web3 environment coincide with some cybersecurity practices. For example, the CIA Triad (Confidential, Integrity and Availability) can be applied in this space to help ensure that data is always available and the integrity of the data is ensured. Depending on the use-case of the DApp, confidentiality can be maintained using additional solutions architecting, like creating private distributed networks.

The key concept in Web3 data management is content-addressable data. The Web2 approach relied on location-addressable data, where the user has to specify the direct location to the data they wish to consume. The best example of location-addressable is the URL of an image on the Internet. Accessing data this way entails the user entering the URL in their web browser, and the web browser goes to the location to view the data. Security is enforced using HTTPS certificates. The HTTPS certificate ensures that the site is “safe” according to a trusted centralized party, and the data integrity might be preserved. We use the term might be here because the duty of the file integrity rests upon the content provider. Diligent content providers might offer SHA checksums to verify that the file’s contents have not changed, and the user can verify, but normally this practice is not widespread.

Content-addressable works on the concept that files are accessed through their unique ID on the decentralized network. Due to decentralization, it is technically impossible to know where the data is exactly stored. Instead, a broadcast message is sent to the network asking for the content based on the ID generated by the file content encoding mechanisms. Once the content is located, the node storing the content passes that content to the callee.

The W3C Decentralized Identifiers (DID) is a proposal for defining a methodology to address content on the Internet through a defined URI scheme. The URI scheme offers a uniform solution to address content from various web technologies available on the decentralized web. One of the technologies that can be used with the W3C DID is the IPFS project. Since the W3C DID leaves the implementation of availability and integrity to the other web technologies, we will now shift focus to the IPFS project.

The IPFS project addresses the issues about file integrity and availability. File integrity is maintained through the Content IDentifier (CID) creation process before the file is transmitted to the IPFS network. CID is a structured string that contains information about the file to help determine various characteristics of that file stored in IPFS. During the CID creation process, the file undergoes a SHA2–256 (at the time of writing) hash function, which is then appended to a portion of the CID. When the user receives the file from the IPFS network, they can perform a SHA2–256 verification by decoding the CID to acquire the SHA2–256 signature and then performing a SHA2–256 hash on the received file. If the hash’s signature and output match, then the file integrity has been maintained. Another mechanism employed by IPFS is enforcing file immutability on the network. When files are added to IPFS, they cannot be changed. Changes to the files are added to the network with new file creation, and a data structure is required to track the changes, sort of like Git commits.

Availability is maintained in IPFS through the caching system on each IPFS node. When a node has downloaded the file, it preserves the file in its cache for a particular time before removal. If guaranteed file persistence is required, the IPFS nodes can be configured to pin that data longer. Services available through Filecoin can help keep these files for a longer duration.

Communication

Modern web applications are now media-rich. For example, video conferencing and collaboration-based applications. Web2 technologies often require authentication to a central server. Once the authentication has been completed, the server sends a list of participants registered on the service. The user can then select another participant to interact with. Depending on how the application has been developed, the user and another participant may only be able to communicate with each other through the central server provided by the service provider.

In the past, a network technology called multicast was used to send broadcasts to the Internet. Nodes listening to a specific multicast address/port will receive the broadcast message. Depending on the broadcast message, the nodes could then reply to those messages. Unfortunately, multicast does have a lot of shortcomings. Most notably, not all networks support multicast making it difficult for the average user to leverage a multicast solution. Multicast would also add a lot of unnecessary data on the Internet leading to network congestion issues.

An alternative approach could be utilized using the technology behind the BitTorrent DHT protocol. A new node to the network could contact a bootstrap node to download an initial list of peers. The list can then be utilized by the node to communicate with the peers to acquire the necessary data. The usage of bootstrap nodes is popular within the Web3 ecosystem, as demonstrated in setting up a private Ethereum network. The libp2p project provides the toolset to implement peer discovery through the web browser or application-specific agents. Through the libp2p project, solutions can be developed utilizing similar concepts behind the BitTorrent DHT methods for DApps. This allows developers to quickly fast track the peer discovery and communication aspects in the Web3 ecosystem.

Future trends

The trend with Web3 focuses on promoting the web browser to become first-class citizens with an emphasis on developing agnostic web applications not linked to OS-specific libraries, in addition to traditional mobile apps development. Advancements in JavaScript and Web Assembly (WASM) now provide more functionality from web browsers to build elaborate applications. These advancements help web browsers break the traditional model of query and response to a central server for content (i.e. centralized web server hosting content) and focus on broadcasting to a decentralized network using a content-addressing scheme.

The move to decentralization will offer more innovative opportunities to develop solutions that aid society by removing the restriction placed by centralization. This can help remove the owners of centralized systems owning content created by their user base. Having the user in control of their own data can also empower the user to select who can have access to their data. While we didn’t mention the control of access in the article, Web3 can allow the users to revoke access after a certain task has been performed to avoid possible future data leaks. We will talk about this in a later post.

In the next series of blog posts, we will go further into authentication/identity, data management and communication.

--

--

Salim Haniff

Founder of Factory 127, an Industry 4.0 company. Specializing in cloud, coding and circuits.