Core Concepts

This page explains the foundational concepts that underpin identity tokenization systems. Understanding these building blocks is essential for designing and implementing privacy-preserving identity architectures.

Unique Identification Number (UIN)

A Unique Identification Number (UIN), also known as a Unique Identity Number, is a stable, persistent identifier assigned to an individual within an identity system.[4] The UIN serves as the foundational anchor from which other identifiers (tokens, sectoral IDs) are derived. According to OSIA (Open Standards Identity APIs), all persons recorded in a registry have a UIN that is considered a key to access the person's data for all records.[4]

Key Characteristics

Design Consideration

UIN generation should use cryptographically secure random number generation or deterministic derivation from biometric templates to ensure uniqueness while preventing guessing attacks.

UIN vs. External Identifiers

Aspect UIN (Internal) External Identifiers
Visibility Never exposed outside the identity system Shared with relying parties
Stability Permanent May be rotatable or context-specific
Correlation Risk Protected by system controls Designed to prevent cross-context linking

Tokenization

Tokenization is the process of replacing sensitive data elements with non-sensitive substitutes (tokens) that retain essential information about the data without compromising its security.[1]

How Tokenization Works

  1. Sensitive data (e.g., national ID number) is submitted to the tokenization service
  2. The service generates a token and stores the mapping securely
  3. The token is returned and used in place of the original data
  4. De-tokenization requires access to the tokenization service and appropriate authorization

Tokenization vs. Encryption

Property Tokenization Encryption
Relationship to Original No mathematical relationship Mathematically derived
Reversibility Requires lookup in token vault Requires decryption key
Format Preservation Can preserve format (e.g., same length) Output typically differs from input
Key Management No cryptographic key for the token itself Key management is critical
Compliance Tokens may be considered out of PCI DSS scope[10] Encrypted data remains in scope

Pseudonymization vs. Anonymization

GDPR distinguishes between pseudonymization and anonymization, with significant implications for data protection obligations.[3]

Pseudonymization

According to GDPR Article 4(5), pseudonymization means "the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person."[3][15]

Anonymization

Anonymization renders data truly anonymous such that the data subject is not or no longer identifiable. Properly anonymized data falls outside the scope of GDPR.[11][16]

Important Distinction

Tokenization typically produces pseudonymized data, not anonymized data, because the tokenization service retains the ability to reverse the process. This means GDPR still applies to tokenized personal data.

Comparison

Aspect Pseudonymization Anonymization
GDPR Applicability Still applies Does not apply
Reversibility Reversible with additional data Irreversible
Data Utility High (individual records linkable) Lower (aggregated/generalized)
Use Cases Operational processing, analytics Statistical analysis, research

Selective Disclosure

Selective disclosure enables individuals to reveal only specific claims or attributes from a credential without exposing the entire dataset. This is a key enabler of data minimization.[8]

How Selective Disclosure Works

Consider a driver's license that contains: name, date of birth, address, photo, license class, and expiration date. To verify someone is over 21, traditional approaches require showing the entire license. With selective disclosure:

  1. The credential holder receives a request: "Prove you are over 21"
  2. The holder's wallet creates a proof containing only: "Age >= 21: true"
  3. The verifier receives confirmation without learning the exact birth date, name, or address

SD-JWT (Selective Disclosure JWT)

SD-JWT is a specification that extends JSON Web Tokens to support selective disclosure.[8] Key features include:

Identity Tokenization Concepts: Selective Disclosure and Sectoral Identifiers

Figure 1: Selective disclosure flow and sectoral identifier derivation

Sectoral Identifiers

Sectoral identifiers are derived identifiers specific to a particular sector or domain (e.g., health, tax, banking). They enable necessary data sharing within a sector while preventing correlation across sectors.[4]

Derivation Concept

Sectoral identifiers are typically derived using a one-way function (such as HMAC or a Key Derivation Function) that combines:

Benefits

Concepts in Practice

These concepts work together to create privacy-preserving identity systems:

  1. An individual is enrolled and assigned a UIN
  2. Sectoral identifiers are derived for each sector the individual interacts with
  3. When a service needs identity verification, it requests specific attributes
  4. The individual grants consent, and a consent token is issued
  5. The tokenization service validates the consent and releases only the authorized attributes
  6. Selective disclosure ensures only necessary information is shared
  7. All operations are logged for audit, using pseudonymized identifiers

See the Architecture page for detailed system design, or explore Use Cases for practical applications.

Visual Guide to Core Concepts

The diagrams below provide visual explanations of each foundational concept covered above.

1. Unique Identification Number (UIN) — Anchor Concept

UIN Anchor Concept Diagram

The UIN serves as the stable, internal anchor for an individual's identity within the system. It never leaves the identity authority, ensuring that derived identifiers (tokens, sectoral IDs) can be generated without exposing the foundational identifier. This separation is critical for preventing cross-context tracking.

2. Tokenization — Substitution, Not Encryption

Tokenization Service Diagram

Tokenization replaces sensitive data with a meaningless substitute (token) via a secure lookup service. Unlike encryption, there's no mathematical relationship between the token and original data—reversibility requires access to the tokenization vault. If the vault is breached, tokens in external systems remain worthless to attackers.

3. Pseudonymization — Reversible with Separation

Pseudonymization Concept Diagram

Pseudonymization processes personal data so it can no longer be attributed to a specific individual without additional information kept separately. The data remains personal under GDPR, but the separation of identifiers from attributes reduces risk. Re-identification requires combining multiple data sources under controlled access.

4. Anonymization — Irreversible by Design

Anonymization Concept Diagram

Anonymization renders data truly anonymous such that individuals cannot be re-identified by any reasonably available means. Techniques include aggregation, generalization, and noise injection. Once properly anonymized, data falls outside GDPR's scope—but achieving genuine irreversibility is technically challenging and must account for future re-identification risks.

5. Selective Disclosure — Share Only What Is Needed

Selective Disclosure Concept Diagram

Selective disclosure allows an individual to reveal specific claims or attributes from a credential without exposing the entire dataset. Using cryptographic techniques like SD-JWT (Selective Disclosure JSON Web Tokens), a user can prove they meet certain criteria (e.g., "over 18") without disclosing their exact birthdate or other unrelated information.

6. Sectoral Identifiers — No Cross-Sector Correlation

Sectoral Identifiers Concept Diagram

Each sector (health, tax, banking, education) receives a distinct identifier derived from the UIN using sector-specific salts or keys. Without access to the derivation mechanism, observers cannot correlate an individual's activities across sectors. This architectural choice prevents the emergence of a universal tracking identifier while enabling necessary cross-agency coordination under controlled conditions.

Share this page

Disclaimer: This website provides educational content about identity tokenization concepts and architectures. It does not constitute legal advice. Organizations should consult qualified legal and technical professionals when implementing identity systems.