Core Concepts

This page explains the foundational concepts that underpin identity tokenization systems. Understanding these building blocks is essential for designing and implementing privacy-preserving identity architectures.

→ Jump to Visual Guide with Diagrams

Unique Identification Number (UIN)

A Unique Identification Number (UIN), also known as a Unique Identity Number, is a stable, persistent identifier assigned to an individual within an identity system.[4] The UIN serves as the foundational anchor from which other identifiers (tokens, sectoral IDs) are derived. According to OSIA (Open Standards Identity APIs), all persons recorded in a registry have a UIN that is considered a key to access the person's data for all records.[4]

Key Characteristics

Uniqueness: Each individual receives exactly one UIN within the system
Persistence: The UIN remains stable throughout the individual's lifecycle
Internal Use: The UIN is never shared directly with external parties to prevent cross-context correlation
Non-Meaningful: The UIN should not encode personal information (avoid semantic identifiers)
System-Specific: The UIN does not have to be the same throughout all registries as long as there is a mechanism to map different UINs among them[4]

Design Consideration

UIN generation should use cryptographically secure random number generation or deterministic derivation from biometric templates to ensure uniqueness while preventing guessing attacks.

UIN vs. External Identifiers

Aspect	UIN (Internal)	External Identifiers
Visibility	Never exposed outside the identity system	Shared with relying parties
Stability	Permanent	May be rotatable or context-specific
Correlation Risk	Protected by system controls	Designed to prevent cross-context linking

Tokenization

Tokenization is the process of replacing sensitive data elements with non-sensitive substitutes (tokens) that retain essential information about the data without compromising its security.[1]

How Tokenization Works

Sensitive data (e.g., national ID number) is submitted to the tokenization service
The service generates a token and stores the mapping securely
The token is returned and used in place of the original data
De-tokenization requires access to the tokenization service and appropriate authorization

Tokenization vs. Encryption

Property	Tokenization	Encryption
Relationship to Original	No mathematical relationship	Mathematically derived
Reversibility	Requires lookup in token vault	Requires decryption key
Format Preservation	Can preserve format (e.g., same length)	Output typically differs from input
Key Management	No cryptographic key for the token itself	Key management is critical
Compliance	Tokens may be considered out of PCI DSS scope[10]	Encrypted data remains in scope

Pseudonymization vs. Anonymization

GDPR distinguishes between pseudonymization and anonymization, with significant implications for data protection obligations.[3]

Pseudonymization

According to GDPR Article 4(5), pseudonymization means "the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person."[3][15]

Data remains personal data under GDPR[16]
Re-identification is possible with additional information
Reduces risk while maintaining utility
Examples: replacing names with codes, using UIN-derived tokens

Anonymization

Anonymization renders data truly anonymous such that the data subject is not or no longer identifiable. Properly anonymized data falls outside the scope of GDPR.[11][16]

Must be irreversible
No reasonable means of re-identification
Techniques include aggregation, generalization, noise addition
Must consider all reasonably likely re-identification attempts

Important Distinction

Tokenization typically produces pseudonymized data, not anonymized data, because the tokenization service retains the ability to reverse the process. This means GDPR still applies to tokenized personal data.

Comparison

Aspect	Pseudonymization	Anonymization
GDPR Applicability	Still applies	Does not apply
Reversibility	Reversible with additional data	Irreversible
Data Utility	High (individual records linkable)	Lower (aggregated/generalized)
Use Cases	Operational processing, analytics	Statistical analysis, research

Selective Disclosure

Selective disclosure enables individuals to reveal only specific claims or attributes from a credential without exposing the entire dataset. This is a key enabler of data minimization.[8]

How Selective Disclosure Works

Consider a driver's license that contains: name, date of birth, address, photo, license class, and expiration date. To verify someone is over 21, traditional approaches require showing the entire license. With selective disclosure:

The credential holder receives a request: "Prove you are over 21"
The holder's wallet creates a proof containing only: "Age >= 21: true"
The verifier receives confirmation without learning the exact birth date, name, or address

SD-JWT (Selective Disclosure JWT)

SD-JWT is a specification that extends JSON Web Tokens to support selective disclosure.[8] Key features include:

Claims can be individually disclosed or hidden
Uses salted hashes to protect undisclosed claims
Verifier can validate disclosed claims without seeing others
Compatible with existing JWT infrastructure

Figure 1: Selective disclosure flow and sectoral identifier derivation

Sectoral Identifiers

Sectoral identifiers are derived identifiers specific to a particular sector or domain (e.g., health, tax, banking). They enable necessary data sharing within a sector while preventing correlation across sectors.[4]

Derivation Concept

Sectoral identifiers are typically derived using a one-way function (such as HMAC or a Key Derivation Function) that combines:

The individual's UIN
A sector-specific identifier or key
Optionally, additional context (e.g., year, jurisdiction)

Benefits

Privacy: Tax authority cannot use their identifier to look up health records
Compartmentalization: Breach in one sector doesn't expose identity in others
Auditability: Each sector has distinct identifiers for their audit trails
Interoperability: Central authority can still link records when legally authorized

User Consent Tokens

User Consent Tokens (UCTs) are cryptographically verifiable artifacts that encode an individual's consent for specific data processing activities. They bind consent to specific purposes, recipients, and time periods.[5]

Components of a Consent Token

Component	Description
Subject Identifier	Reference to the data subject (typically tokenized)
Scopes	Specific attributes or operations authorized
Audience	The relying party authorized to use this consent
Purpose	The stated reason for data processing
Issued At	Timestamp of consent grant
Expiry (TTL)	When the consent expires
Signature	Cryptographic proof of authenticity

Consent Token Lifecycle

Request: Relying party requests specific attributes with stated purpose
Review: Data subject reviews the request through their identity wallet or consent portal
Grant: Data subject approves, and consent service issues UCT
Use: Relying party presents UCT when requesting data
Validation: Tokenization service verifies UCT before releasing data
Expiry/Revocation: Consent expires or is revoked by the data subject

Time-to-Live (TTL) Considerations

The expiry period should balance usability with privacy:

Short TTL (minutes to hours): One-time transactions, high-sensitivity data
Medium TTL (days to weeks): Ongoing service relationships, KYC processes
Long TTL (months): Standing authorizations, healthcare provider relationships

Design Consideration

Even with long-lived consent tokens, systems should support immediate revocation. This requires either short-lived tokens with refresh, or a revocation checking mechanism at validation time.

Concepts in Practice

These concepts work together to create privacy-preserving identity systems:

An individual is enrolled and assigned a UIN
Sectoral identifiers are derived for each sector the individual interacts with
When a service needs identity verification, it requests specific attributes
The individual grants consent, and a consent token is issued
The tokenization service validates the consent and releases only the authorized attributes
Selective disclosure ensures only necessary information is shared
All operations are logged for audit, using pseudonymized identifiers

See the Architecture page for detailed system design, or explore Use Cases for practical applications.

Visual Guide to Core Concepts

The diagrams below provide visual explanations of each foundational concept covered above.

1. Unique Identification Number (UIN) — Anchor Concept

The UIN serves as the stable, internal anchor for an individual's identity within the system. It never leaves the identity authority, ensuring that derived identifiers (tokens, sectoral IDs) can be generated without exposing the foundational identifier. This separation is critical for preventing cross-context tracking.

2. Tokenization — Substitution, Not Encryption

Tokenization replaces sensitive data with a meaningless substitute (token) via a secure lookup service. Unlike encryption, there's no mathematical relationship between the token and original data—reversibility requires access to the tokenization vault. If the vault is breached, tokens in external systems remain worthless to attackers.

3. Pseudonymization — Reversible with Separation

Pseudonymization processes personal data so it can no longer be attributed to a specific individual without additional information kept separately. The data remains personal under GDPR, but the separation of identifiers from attributes reduces risk. Re-identification requires combining multiple data sources under controlled access.

4. Anonymization — Irreversible by Design

Anonymization renders data truly anonymous such that individuals cannot be re-identified by any reasonably available means. Techniques include aggregation, generalization, and noise injection. Once properly anonymized, data falls outside GDPR's scope—but achieving genuine irreversibility is technically challenging and must account for future re-identification risks.

5. Selective Disclosure — Share Only What Is Needed

Selective disclosure allows an individual to reveal specific claims or attributes from a credential without exposing the entire dataset. Using cryptographic techniques like SD-JWT (Selective Disclosure JSON Web Tokens), a user can prove they meet certain criteria (e.g., "over 18") without disclosing their exact birthdate or other unrelated information.

6. Sectoral Identifiers — No Cross-Sector Correlation

Each sector (health, tax, banking, education) receives a distinct identifier derived from the UIN using sector-specific salts or keys. Without access to the derivation mechanism, observers cannot correlate an individual's activities across sectors. This architectural choice prevents the emergence of a universal tracking identifier while enabling necessary cross-agency coordination under controlled conditions.

Disclaimer: This website provides educational content about identity tokenization concepts and architectures. It does not constitute legal advice. Organizations should consult qualified legal and technical professionals when implementing identity systems.

Core Concepts

Unique Identification Number (UIN)

Key Characteristics

UIN vs. External Identifiers

Tokenization

How Tokenization Works

Tokenization vs. Encryption

Pseudonymization vs. Anonymization

Pseudonymization

Anonymization

Comparison

Selective Disclosure

How Selective Disclosure Works

SD-JWT (Selective Disclosure JWT)

Sectoral Identifiers

Derivation Concept

Benefits

User Consent Tokens

Components of a Consent Token

Consent Token Lifecycle

Time-to-Live (TTL) Considerations

Concepts in Practice

Visual Guide to Core Concepts

1. Unique Identification Number (UIN) — Anchor Concept

2. Tokenization — Substitution, Not Encryption

3. Pseudonymization — Reversible with Separation

4. Anonymization — Irreversible by Design

5. Selective Disclosure — Share Only What Is Needed

6. Sectoral Identifiers — No Cross-Sector Correlation

Share this page