The Ultimate Guide to the Decentralized Identity World

12 min readNov 3, 2023

1 Overview and Core Values

Decentralized Identity (DID) denotes a system where a user’s identity data is not stored on any particular website server but is instead a self-managed identity system. This stands in stark contrast to “centralized identity.” Familiar examples of centralized identity include email or phone number-based logins and sign-ins via platforms like Twitter or Google. These identity verification methods fall under the purview of Digital Identity. For a comprehensive history of digital identity, one can refer to the article, A Step Back in Time: The History and Evolution of Digital Identity. Although the concept of DID might still be distant from the mainstream view, it boasts numerous advantages over centralized identity, such as heightened security against theft (anonymity) and protection against misuse by others (safety).

Before delving deeper, it’s crucial to underscore one of the core values of DID: absolute anonymity or the utmost autonomy in identity. There’s a pertinent analogy to consider: “If you wish to open a window, first advocate for the removal of the roof.” This is used to describe how to safeguard user privacy. We cannot naively rely on the assumption that “data collectors are benevolent” or make presumptuous claims that “most users don’t care about privacy.” Only absolute anonymity offers a foolproof method to protect user privacy.

To illustrate the differences between “centralized” and “decentralized” identity, consider a familiar scenario: If one wishes to review their expenditure in a digital wallet (such as PayPal or Alipay):

With centralized identity: My bills and personal information are stored on the app’s server. The app can potentially sell data about my spending power and habits to third parties for profit, all without my knowledge.
With decentralized identity: My expenditure records are transparently stored on the blockchain while my identity resides on my local mobile device. Unless I disclose my identity, no one can correlate my transactions with “me.”

While centralized identity logins offer a degree of flexibility in the rapidly evolving internet, they inherently lack robust defenses for user data privacy. For a thorough, trust-free safeguarding of user data, privacy, and assets, the pendulum needs to swing to the other extreme: decentralized identity.

1.1 Centralized Identity Systems: Inherent Challenges and Potential Risks

Figure 1: every service provider has stored the identity information of users.

In the widely accepted centralized identity system, when users log in or register on different websites, their credentials (typically username and password) are stored on the respective website’s servers. This approach presents several issues:

When logging into various websites, users must undergo repeated registration processes and remember distinct passwords for each site.
Users’ credentials are housed on the website’s servers, granting website developers the potential capability to access and possibly misuse users’ identity information.
If there’s a data breach on one website, a user’s identity on other websites may also become vulnerable to security risks.

Logging into other sites using platforms such as Twitter (now called X) or WeChat introduces additional concerns:

Identity providers, like Twitter or WeChat, gain the ability to know the list of sites a user has accessed.
Other websites can access a user’s data from platforms like Twitter.

Such vulnerabilities can lead to smaller websites, struggling with growth or revenue, opting to sell user data as a last-resort revenue stream. On the other hand, major platforms (e.g., Twitter, WeChat) may concentrate on increasing volumes of user information as more and more sites incorporate their easy login options. The unfettered circulation of this data paves the way for pervasive targeted advertising, significantly influencing every individual’s daily life and experiences.

1.2 Decentralized Identity (DID): Grounded in Cryptography

Figure 2: the communication between users and websites is encrypted; User identities are stored locally instead of on servers.

Before grasping the intricacies of DID, it is pivotal to understand the logic behind cryptographic key generation, encompassing public/private keys and the sign/verify process.

Private Key: Within blockchain networks, a private key is a randomly generated 256-bit binary number on a user’s local computer. The maximum number of unique private keys that can be generated is 2 raised to the power of 256. In decimal notation, this is an astonishingly large number:

115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936

The probability of duplicating a private key is even lower than the odds of the Earth imploding, ensuring its unique nature even when generated locally. It’s vital to keep private keys confidential, refraining from exposing them to others.

Public Key: Every private key has a corresponding public key, which can be shared openly. While private keys are used for signing (or encrypting) data, public keys are used for verification (or decrypting). In essence, a user can employ their private key to sign a specific piece of text (provided by a verifier). Once signed, this information can be sent over the internet. Other parties can then use the corresponding public key to verify (or decrypt) this data, thus authenticating the user’s identity.

With this foundational understanding, one can discern:

Identities, unique across the internet, can be acquired through the cryptographic generation of public/private keys.
As private keys are generated locally, the actual identity behind these keys remains completely confidential.

Such a mode of identity initialization underpins the essence of DID. If one cannot guarantee user identity anonymity, then any privacy protection measure built upon this identity system loses its significance. Consider an e-commerce example:

Centralized Identity: Even if my shopping list isn’t circulated online, the e-commerce platform knows exactly what “I” have purchased.
Decentralized Identity: Even if items I’ve bought are publicly listed on the internet, no one can ascertain that the purchaser of these items is “me”.

1.3 Decentralized Identity (DID): Bridging Virtual and Real Worlds

A confluence is necessary for the digital sphere to intersect with the tangible world truly. Services based on real-name verification, such as VIP recognition on social platforms, enterprise entity certification, and the infrastructure of education, medical care, and social security systems exemplify this. Therefore, alongside preserving anonymity, DID must also ensure:

When I want to prove “who I am” to you, I can self-authenticate.
When I prefer not to disclose “who I am,” I remain anonymous.
When I manifest my identity to another, only that individual can validate it.

This sounds like an evolution beyond the conventional “ID” framework. Contrasting the familiar unique user_id in databases or the unique username on websites, one can make the following analogy:

Logging in via email, mobile number, Twitter, and similar platforms corresponds to the “account system,” akin to public/private keys.
Unique user IDs or usernames on websites belong to the “identity system.”
In the traditional account system, real-name verification is achieved through biometrics, submission of specific documents, etc. These are hotbeds for user data breaches. Hence, a DID-based account system must address the “identity verification” challenge effectively.

In summary, DID isn’t as straightforward as the unique username on conventional websites. It embodies a comprehensive identity verification system. The most widely accepted in this realm is the DID protocol by W3C.

2 W3C Decentralized Identifier Protocol

Figure 3: The relationship between roles and permissions in W3C DID.

In the W3C DID (Decentralized Identifier) specification, a user (referred to as the “Holder”) possesses a certificate issued by an “Issuer.” The content of this certificate is stored locally, while the issuance record of the certificate is retained in a tamper-proof public storage known as the Verifiable Data Registry. When a verifier requires identity validation, the user sends the verifier an “anonymized and encrypted” version of the issued certificate (known as VP or Verifiable Presentation). The verifier, in turn, cross-references this information with trusted storage to verify the issuer’s relevant details and credentials, thereby completing the identity verification process.

This section will not elaborate on the technical principles of W3C’s DID. What is vital to understand is that the W3C DID approach is premised on the concept of a “Trust Chain.” The notion of a trust chain is structured as follows:

The user’s certificate is issued by a trusted issuing authority.
The issuing authority is deemed trustworthy because its institutional certification is provided by an even more trusted institution.
At the pinnacle of this trust hierarchy, there’s a “root institution,” typically a governmental entity.
The “certificate” of this root institution is publicly disclosed on a secure website controlled by the root institution itself.

Adopting this structure establishes a trust chain, effectively mapping real-world societal structures onto the digital realm. Within the W3C DID framework, besides the “trust chain,” it’s also worth noting the importance of “certificates” like academic degrees, identity cards, passports, etc. These certificates play a pivotal role in establishing and verifying identities in both physical and online spaces.

2.1 Roles & Permissions

The concept of “identity” is inherently intricate. However, for easier comprehension, we can categorize the identity system into two abstract notions: “Roles” and “Permissions.” For instance:

If I am a librarian, I can permit or deny certain individuals from entering the library. Here, “librarian” is the “role,” and “allowing or denying some individuals entrance to the library” is the “permission.”
When a company’s job posting mentions a requirement for candidates with a background in computer-related disciplines, this “computer-related discipline” criterion acts as a “permission” for the hiring company regarding resume submissions. For the job seeker, it represents a “role” they must possess.

It’s evident that a single individual can have multiple roles, and each role can encompass a variety of permissions. This abstraction pattern is widely recognized and extensively employed, known as RBAC (Role-based access control).

Within the context of W3C DID, for the Holder, these certificates represent “roles,” while for the Verifier, they signify “permissions.” In common scenarios, roles and permissions can be aligned, but there are also instances where they diverge. In situations where roles and permissions don’t align, there’s a need to “anonymize and encrypt” our roles. This introduces two terms from the certificate realm: VC (Verifiable Credentials) and VP (Verifiable Presentation). To further our understanding, we must delve into the concept of Merkle Proof.

2.2 Merkle Proof

Let’s first discuss the hash function. For any given file X, after processing it through a hash function, a fixed-length string Y is produced. It’s impossible to deduce X from Y. This way, as long as you inform someone about Y first and provide them with X later, it can be proven that file X belonged to you even before its online disclosure. Hash functions are commonly utilized for electronic file copyright protection.

Regarding the Merkle Proof, imagine I possess a graduation certificate that contains the following information:

College Name
Discipline Name
Graduate’s Name

D/E/F/G/H contain other relevant details.

Each piece of information on the certificate can be independently encrypted using a hash algorithm (from A to H). Then, they are paired and encrypted until eventually condensed into a single hash string, referred to as the “Merkle Root.”

Suppose we need to prove to a potential employer that our degree is in “computer-related disciplines.” We only need to send the plaintext information of A, B, and C, while D to H remain encrypted as hashes. Simultaneously, we’d send the Merkle Root. The potential employer can then utilize the same hash algorithm to reconstruct the Merkle Root from A to H. By comparing this with the Merkle Root of the certificate issued by the school in public, verifiable storage; they can validate the authenticity of the information A, B, and C.

In this context, the school-issued certificate containing the information from A to H acts as Verifiable Credentials (VC), while the document used to verify our identity to the company is Verifiable Presentation (VP).

The aforementioned serves as a rudimentary illustration of how W3C DID conducts identity verification. However, in real-life scenarios, there are instances where we might not wish to provide verifiers with precise details. For example, if an employer mandates that a job seeker must have a degree in computer-related disciplines and be under the age of 35, the applicant might only want to prove they’re under the age of 35 without revealing their exact age.

In such fuzzy verification processes, the Merkle Proof cannot function autonomously. Hence, another concept becomes pivotal: Zero-knowledge proof.

2.3 Zero-knowledge proof

For a colloquial explanation of “Zero-Knowledge Proof,” one can directly refer to the Zero-knowledge proof page on Wikipedia.

Adapting the Wikipedia example, where Alice demonstrates she knows how to unlock a secret door without revealing the key, we can modify the logic to ascertain whether Alice is under 35 years old. The proof of being under 35 years old would require sending a VP to the verifier. The content of VP can only be extracted from the VC, as the verifier should only trust the credential issuer and not necessarily the credential holder.

Therefore, when using ZKP for verification, the validated content must be extracted exclusively from the credentials. For instance:

Age equals 35 years: This information cannot be used for ZKP.
Age less than 35 years: This information can be used for ZKP.

3 Various Subfields in DID Verticle

Presently, prominent contributors and solutions in the DID domain include W3C DID, Microsoft Entra, OpenID, and DIF. The main subfields and their leading projects are as follows:

Soul-bound Token

Example: Galxe
Logic: The non-sensitive data can be presented to users through the issuance of SBTs.
Value: The user’s public activity information can be validated by minting SBTs.
Challenges: They can’t completely eliminate bots. Hence a combination with KYC-type projects is essential for maximizing value.

Profile

Example: Clique
Logic: Users consolidate their identities from third-party applications (such as Twitter) via OAuth2 or open verification methods, merging them into one identity (like a Web3 wallet).
Value: These profile platforms will issue publicly anonymized SBTs to users or provide user data to third parties via a private API.
Challenges: The identity data originates from third-party applications, which themselves can’t entirely rule out bots. If the product’s primary purpose is to deter bots, its value is limited. If its goal is to establish a Social Graph, striking a balance between the business model and decentralization becomes challenging.

KYC (Know your customer)

Example: Worldcoin
Logic: Users send their real identity information to a third party, which could be a randomly selected group of individuals from a DAO or a specialized institution using machine recognition.
Value: Based on the verification of users’ real identity, SBTs are eventually issued.
Challenges: If identification is done through a DAO’s human verification, efficiency can be low. If it’s via a specialized institution’s machine recognition, there’s an inherent trust issue with the institution acting as a single point of failure, making it impossible to 100% prevent malicious activities by issuers.

ZKP (Zero-knowledge Proof)

Example: zCloak Network
Logic: Users generate specific verifiable information for the project side.
Value: Based on the verification results, the project side issues DID certificates. These certificates can be verified by third parties through W3C DID.
Challenges: It’s challenging for the certificate issuing organization to establish robust credibility or achieve decentralization.

Reputation

Example: RociFi
Logic: Users link their online/offline data to the project site, which settles scores based on this data.
Value: Scores are eventually provided to third parties for lending purposes either via public SBTs or a private API.
Challenges: The project side faces challenges in ensuring fairness in user identity and score determination.

Domain

Example: Communities ID
Logic: Users register on decentralized ledger networks to obtain a unique nickname to replace their wallet address, e.g., using vitalik.eth instead of a lengthy alphanumeric address.
Value: Wallet addresses aren’t user-friendly in social scenarios. Unique names are easier to represent an “identity”.
Challenges: The project side struggles to establish technological barriers and must rely on ecosystem barriers and consensus for survival. There’s a tendency for the user “identity” layer to become fragmented.

Figure 6: Landscape of current DID Verticle.

4 Future Prospects

Since as early as 1990, when Microsoft began contemplating the “SaaS-ification” of identity systems, to the current era where users’ privacy concerns have increasingly captured public attention, DIDs (Decentralized Identifiers) are poised to replace centralized identity systems and emerge as the new norm.

The official version of the W3C DID draft has already been released. As WebAuthn and passkey are progressively adopted by more developers, account systems based on the DID philosophy are unknowingly being used by increasing users.

Once account systems become widespread, the value of the certification system will become increasingly significant, finding applications in various facets of daily life.

Ultimately, the internet world will undergo a “decentralization” process, rebuilt from the ground up. Systems based on privacy protection, such as identity aggregation and reputation systems, will find more common application scenarios. By then, SocialFi and GameFi will become ubiquitous, people will no longer use addresses to identify each other, and the Domain system will become an indispensable foundation within the DID landscape.