Core Concepts
This page explains the foundational concepts that underpin identity tokenization systems. Understanding these building blocks is essential for designing and implementing privacy-preserving identity architectures.
→ Jump to Visual Guide with Diagrams
Unique Identification Number (UIN)
A Unique Identification Number (UIN), also known as a Unique Identity Number, is a stable, persistent identifier assigned to an individual within an identity system.[4] The UIN serves as the foundational anchor from which other identifiers (tokens, sectoral IDs) are derived. According to OSIA (Open Standards Identity APIs), all persons recorded in a registry have a UIN that is considered a key to access the person's data for all records.[4]
Key Characteristics
- Uniqueness: Each individual receives exactly one UIN within the system
- Persistence: The UIN remains stable throughout the individual's lifecycle
- Internal Use: The UIN is never shared directly with external parties to prevent cross-context correlation
- Non-Meaningful: The UIN should not encode personal information (avoid semantic identifiers)
- System-Specific: The UIN does not have to be the same throughout all registries as long as there is a mechanism to map different UINs among them[4]
UIN generation should use cryptographically secure random number generation or deterministic derivation from biometric templates to ensure uniqueness while preventing guessing attacks.
UIN vs. External Identifiers
| Aspect | UIN (Internal) | External Identifiers |
|---|---|---|
| Visibility | Never exposed outside the identity system | Shared with relying parties |
| Stability | Permanent | May be rotatable or context-specific |
| Correlation Risk | Protected by system controls | Designed to prevent cross-context linking |
Tokenization
Tokenization is the process of replacing sensitive data elements with non-sensitive substitutes (tokens) that retain essential information about the data without compromising its security.[1]
How Tokenization Works
- Sensitive data (e.g., national ID number) is submitted to the tokenization service
- The service generates a token and stores the mapping securely
- The token is returned and used in place of the original data
- De-tokenization requires access to the tokenization service and appropriate authorization
Tokenization vs. Encryption
| Property | Tokenization | Encryption |
|---|---|---|
| Relationship to Original | No mathematical relationship | Mathematically derived |
| Reversibility | Requires lookup in token vault | Requires decryption key |
| Format Preservation | Can preserve format (e.g., same length) | Output typically differs from input |
| Key Management | No cryptographic key for the token itself | Key management is critical |
| Compliance | Tokens may be considered out of PCI DSS scope[10] | Encrypted data remains in scope |
Pseudonymization vs. Anonymization
GDPR distinguishes between pseudonymization and anonymization, with significant implications for data protection obligations.[3]
Pseudonymization
According to GDPR Article 4(5), pseudonymization means "the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person."[3][15]
- Data remains personal data under GDPR[16]
- Re-identification is possible with additional information
- Reduces risk while maintaining utility
- Examples: replacing names with codes, using UIN-derived tokens
Anonymization
Anonymization renders data truly anonymous such that the data subject is not or no longer identifiable. Properly anonymized data falls outside the scope of GDPR.[11][16]
- Must be irreversible
- No reasonable means of re-identification
- Techniques include aggregation, generalization, noise addition
- Must consider all reasonably likely re-identification attempts
Tokenization typically produces pseudonymized data, not anonymized data, because the tokenization service retains the ability to reverse the process. This means GDPR still applies to tokenized personal data.
Comparison
| Aspect | Pseudonymization | Anonymization |
|---|---|---|
| GDPR Applicability | Still applies | Does not apply |
| Reversibility | Reversible with additional data | Irreversible |
| Data Utility | High (individual records linkable) | Lower (aggregated/generalized) |
| Use Cases | Operational processing, analytics | Statistical analysis, research |
Selective Disclosure
Selective disclosure enables individuals to reveal only specific claims or attributes from a credential without exposing the entire dataset. This is a key enabler of data minimization.[8]
How Selective Disclosure Works
Consider a driver's license that contains: name, date of birth, address, photo, license class, and expiration date. To verify someone is over 21, traditional approaches require showing the entire license. With selective disclosure:
- The credential holder receives a request: "Prove you are over 21"
- The holder's wallet creates a proof containing only: "Age >= 21: true"
- The verifier receives confirmation without learning the exact birth date, name, or address
SD-JWT (Selective Disclosure JWT)
SD-JWT is a specification that extends JSON Web Tokens to support selective disclosure.[8] Key features include:
- Claims can be individually disclosed or hidden
- Uses salted hashes to protect undisclosed claims
- Verifier can validate disclosed claims without seeing others
- Compatible with existing JWT infrastructure
Figure 1: Selective disclosure flow and sectoral identifier derivation
Sectoral Identifiers
Sectoral identifiers are derived identifiers specific to a particular sector or domain (e.g., health, tax, banking). They enable necessary data sharing within a sector while preventing correlation across sectors.[4]
Derivation Concept
Sectoral identifiers are typically derived using a one-way function (such as HMAC or a Key Derivation Function) that combines:
- The individual's UIN
- A sector-specific identifier or key
- Optionally, additional context (e.g., year, jurisdiction)
Benefits
- Privacy: Tax authority cannot use their identifier to look up health records
- Compartmentalization: Breach in one sector doesn't expose identity in others
- Auditability: Each sector has distinct identifiers for their audit trails
- Interoperability: Central authority can still link records when legally authorized
User Consent Tokens
User Consent Tokens (UCTs) are cryptographically verifiable artifacts that encode an individual's consent for specific data processing activities. They bind consent to specific purposes, recipients, and time periods.[5]
Components of a Consent Token
| Component | Description |
|---|---|
| Subject Identifier | Reference to the data subject (typically tokenized) |
| Scopes | Specific attributes or operations authorized |
| Audience | The relying party authorized to use this consent |
| Purpose | The stated reason for data processing |
| Issued At | Timestamp of consent grant |
| Expiry (TTL) | When the consent expires |
| Signature | Cryptographic proof of authenticity |
Consent Token Lifecycle
- Request: Relying party requests specific attributes with stated purpose
- Review: Data subject reviews the request through their identity wallet or consent portal
- Grant: Data subject approves, and consent service issues UCT
- Use: Relying party presents UCT when requesting data
- Validation: Tokenization service verifies UCT before releasing data
- Expiry/Revocation: Consent expires or is revoked by the data subject
Time-to-Live (TTL) Considerations
The expiry period should balance usability with privacy:
- Short TTL (minutes to hours): One-time transactions, high-sensitivity data
- Medium TTL (days to weeks): Ongoing service relationships, KYC processes
- Long TTL (months): Standing authorizations, healthcare provider relationships
Even with long-lived consent tokens, systems should support immediate revocation. This requires either short-lived tokens with refresh, or a revocation checking mechanism at validation time.
Concepts in Practice
These concepts work together to create privacy-preserving identity systems:
- An individual is enrolled and assigned a UIN
- Sectoral identifiers are derived for each sector the individual interacts with
- When a service needs identity verification, it requests specific attributes
- The individual grants consent, and a consent token is issued
- The tokenization service validates the consent and releases only the authorized attributes
- Selective disclosure ensures only necessary information is shared
- All operations are logged for audit, using pseudonymized identifiers
See the Architecture page for detailed system design, or explore Use Cases for practical applications.
Visual Guide to Core Concepts
The diagrams below provide visual explanations of each foundational concept covered above.
1. Unique Identification Number (UIN) — Anchor Concept
The UIN serves as the stable, internal anchor for an individual's identity within the system. It never leaves the identity authority, ensuring that derived identifiers (tokens, sectoral IDs) can be generated without exposing the foundational identifier. This separation is critical for preventing cross-context tracking.
2. Tokenization — Substitution, Not Encryption
Tokenization replaces sensitive data with a meaningless substitute (token) via a secure lookup service. Unlike encryption, there's no mathematical relationship between the token and original data—reversibility requires access to the tokenization vault. If the vault is breached, tokens in external systems remain worthless to attackers.
3. Pseudonymization — Reversible with Separation
Pseudonymization processes personal data so it can no longer be attributed to a specific individual without additional information kept separately. The data remains personal under GDPR, but the separation of identifiers from attributes reduces risk. Re-identification requires combining multiple data sources under controlled access.
4. Anonymization — Irreversible by Design
Anonymization renders data truly anonymous such that individuals cannot be re-identified by any reasonably available means. Techniques include aggregation, generalization, and noise injection. Once properly anonymized, data falls outside GDPR's scope—but achieving genuine irreversibility is technically challenging and must account for future re-identification risks.
5. Selective Disclosure — Share Only What Is Needed
Selective disclosure allows an individual to reveal specific claims or attributes from a credential without exposing the entire dataset. Using cryptographic techniques like SD-JWT (Selective Disclosure JSON Web Tokens), a user can prove they meet certain criteria (e.g., "over 18") without disclosing their exact birthdate or other unrelated information.
6. Sectoral Identifiers — No Cross-Sector Correlation
Each sector (health, tax, banking, education) receives a distinct identifier derived from the UIN using sector-specific salts or keys. Without access to the derivation mechanism, observers cannot correlate an individual's activities across sectors. This architectural choice prevents the emergence of a universal tracking identifier while enabling necessary cross-agency coordination under controlled conditions.