Consumer Banking Data Model
1. Use case and Functional Requirements
Use case: support retail banking operations and analytics — account onboarding, KYC, transaction ledger, posting & settlement, disputes, monthly statements, fraud detection, and regulatory reporting.
Functional requirements (FR):
- FR1: Onboard accounts and customers with full auditability and point-in-time queries → dimension history required (SCD-II).
- FR2: Capture every financial event (debit/credit) for ledger accuracy, fraud, and reconciliation → append-only transactional fact.
- FR3: Track posting, settlement, dispute lifecycle milestones and SLA → single-row accumulating fact per process.
- FR4: Produce monthly statements and KPI snapshots tolerating late-posts → monthly snapshot with recompute window.
- FR5: Fast operational lookups for current attributes (KYC, account status) → current flag/index on SCD-II dimensions.
- FR6: Auditability and traceability for compliance → include ETL audit columns and natural keys for reconciliation.
2. Schema details
2.1 Conceptual
We will be using star schema model to build functional requirements. Use SCD-II for dim_customer and dim_account where history and point-in-time queries are required. Use SCD-I for non-historical dimensions. Facts include an append-only transactional ledger, an accumulating fact for lifecycle processes, and periodic snapshots for reporting.
Primary tables (purpose & grain):
dim_customer(SCD-II) — grain: customer × version. Stores historical customer attributes for point-in-time joins and audits.dim_account(SCD-II) — grain: account × version. Stores account attributes and product migrations over time.dim_merchant(SCD-I) — grain: merchant (latest). Merchant descriptive attributes.dim_channel(SCD-I) — grain: channel (latest). Channel metadata (ATM, POS, ONLINE).dim_card(SCD-I) — grain: card (latest). Card metadata (brand, BIN).fact_transaction(transactional fact) — grain: one row per transaction event. Detailed ledger for fraud and reconciliation.fact_transaction_lifecycle(accumulating fact) — grain: one row per process/transaction instance. Tracks lifecycle milestones (initiated → authorized → posted → settled → dispute → resolved).fact_monthly_account_activity(monthly snapshot) — grain: account × calendar_month. Monthly KPIs for statements and dashboards.- optional:
dim_date— calendar/date dimension to aid reporting and partitioning.
All dimensions use surrogate keys (*_sk) and retain source natural keys (*_id) for reconciliation.
Schema Structure:
Based on the consumer banking data model, here’s the conceptual schema structure:
2.2 Table Details
Below are the list of tables with column details:
2.2.1 dim_customer (SCD-II)
Description: Stores customer identity and descriptive attributes. Each row represents the state of a customer for a validity period (SCD-II). Supports point-in-time reporting, KYC history, and audit requirements.
| Column Name | Data Type | Description |
|---|---|---|
customer_sk |
bigint (PK) | Surrogate key for the customer version row. |
customer_id |
varchar | Natural key from source system (customer identifier). |
name |
varchar | Customer full name. |
dob |
date | Date of birth. |
primary_phone |
varchar | Primary contact number. |
email |
varchar | Email address. |
address |
varchar | Standardized mailing address (single field or JSON depending on platform). |
kyc_status |
varchar | KYC/compliance status (e.g., VERIFIED, PENDING). |
risk_segment |
varchar | Risk or segment classification. |
effective_from |
timestamp | Inclusive start of validity for this row. |
effective_to |
timestamp | Exclusive end of validity (NULL when current). |
is_current |
boolean | True when row is the current version for the natural key. |
version_number |
integer | Monotonic version counter for the natural key. |
etl_loaded_ts |
timestamp | When this version row was loaded into the warehouse. |
etl_batch_id |
varchar | ETL batch identifier for traceability. |
source_system |
varchar | Source system identifier. |
Notes:
- Insert a new row for any attribute change; close the previous row by setting
effective_toandis_current = false. - Index
is_current+customer_idfor fast operational lookups; create a composite index oncustomer_id, effective_fromfor point-in-time joins.
2.2.2 dim_account (SCD-II)
Description: Stores account-level attributes and product/status history. Each row represents the state of an account for a validity period.
| Column Name | Data Type | Description |
|---|---|---|
account_sk |
bigint (PK) | Surrogate key for the account version row. |
account_id |
varchar | Natural account identifier from source. |
customer_sk |
bigint (FK) | FK to dim_customer.customer_sk (version-specific as needed). |
account_type |
varchar | e.g., CHECKING, SAVINGS, CREDIT_CARD. |
currency |
varchar | Account currency (ISO code). |
status |
varchar | e.g., ACTIVE, CLOSED, DORMANT. |
opened_date |
date | Account opening date. |
closed_date |
date | Account closure date (if any). |
product_code |
varchar | Product or pricing plan code. |
effective_from |
timestamp | Inclusive start of validity for this row. |
effective_to |
timestamp | Exclusive end of validity (NULL when current). |
is_current |
boolean | True when row is the current version. |
version_number |
integer | Monotonic version counter for the account. |
etl_loaded_ts |
timestamp | Load timestamp into the warehouse. |
etl_batch_id |
varchar | ETL batch id for traceability. |
source_system |
varchar | Source system identifier. |
Notes:
- Use SCD-II flow: close prior version and insert new version on attribute changes (product migrations, status changes).
- Index on
account_id, is_currentfor fast lookups; support point-in-time joins viaeffective_from/effective_to.
2.2.3 dim_merchant (SCD-I)
Description: Merchant descriptive attributes used to enrich transactions. Overwrite on change unless historical merchant profiling is required.
| Column Name | Data Type | Description |
|---|---|---|
merchant_sk |
bigint (PK) | Surrogate key. |
merchant_id |
varchar | Natural merchant identifier. |
merchant_name |
varchar | Merchant display name. |
mcc |
varchar | Merchant Category Code. |
category |
varchar | High-level category (e.g., GROCERY, FUEL). |
address |
varchar | Merchant address. |
contact |
varchar | Merchant contact info. |
last_updated_ts |
timestamp | Last update timestamp. |
source_system |
varchar | Source system identifier. |
Notes:
- Overwrite rows on update. Maintain
last_updated_tsfor audit and source reconciliation. - Index
merchant_idandmccfor common queries.
2.2.4 dim_channel (SCD-I)
Description: Channel metadata for transactions (ATM, POS, ONLINE). Small, operational dimension.
| Column Name | Data Type | Description |
|---|---|---|
channel_sk |
bigint (PK) | Surrogate key. |
channel_id |
varchar | Natural channel identifier. |
channel_name |
varchar | e.g., ATM, POS, ONLINE. |
channel_type |
varchar | Further classification when required. |
description |
varchar | Free text description. |
last_updated_ts |
timestamp | Last update timestamp. |
source_system |
varchar | Source system identifier. |
Notes:
- Overwrite on change; small dimension so SCD-I works well.
2.2.5 dim_card (SCD-I)
Description: Card metadata for transactions (brand, BIN range, issuer).
| Column Name | Data Type | Description |
|---|---|---|
card_sk |
bigint (PK) | Surrogate key. |
card_id |
varchar | Natural card identifier (masked token or BIN-based id). |
card_brand |
varchar | Visa, MasterCard, AMEX, etc. |
bin_prefix |
varchar | BIN or BIN range prefix. |
card_type |
varchar | e.g., DEBIT, CREDIT, PREPAID. |
issuer |
varchar | Issuing bank name or code. |
last_updated_ts |
timestamp | Last update timestamp. |
source_system |
varchar | Source system identifier. |
Notes:
- Overwrite on change; include
last_updated_tsandsource_system.
2.2.6 fact_transaction (transactional fact)
Description: The transaction ledger — one row per transaction event (authorization/posting/reversal). This is the primary source for fraud, reconciliation, and transaction analytics.
| Column Name | Data Type | Description |
|---|---|---|
transaction_sk |
bigint (PK) | Surrogate key for the fact row. |
transaction_id |
varchar | Natural transaction identifier from source. |
account_sk |
bigint (FK) | FK to dim_account.account_sk (current or versioned lookup via point-in-time join). |
customer_sk |
bigint (FK) | FK to dim_customer.customer_sk (version-specific for point-in-time). |
merchant_sk |
bigint (FK) | FK to dim_merchant.merchant_sk. |
channel_sk |
bigint (FK) | FK to dim_channel.channel_sk. |
card_sk |
bigint (FK) | FK to dim_card.card_sk. |
transaction_ts |
timestamp | Event timestamp (authorization/transaction time). |
posting_ts |
timestamp | Posting/settlement timestamp. |
amount |
decimal(18,2) | Signed amount (debit negative / credit positive). |
currency |
varchar(3) | ISO currency code. |
transaction_type |
varchar | e.g., POS, ATM, ACH, TRANSFER, FEE. |
status |
varchar | e.g., AUTHORIZED, POSTED, REVERSED. |
auth_code |
varchar | Authorization code if available. |
fee_amount |
decimal(18,2) | Fee portion (if applicable). |
reversal_flag |
boolean | True if this row represents a reversal/adjustment. |
related_transaction_id |
varchar | Link to original txn for reversals/adjustments. |
etl_loaded_ts |
timestamp | ETL load timestamp. |
etl_batch_id |
varchar | ETL batch identifier. |
source_system |
varchar | Source system identifier. |
Notes:
- Append-only inserts. Use
transaction_id|source_systemas dedupe key during ingest. - Partition by
transaction_tsand include indexes onaccount_sk,customer_sk, andtransaction_tsfor common queries.
2.2.7 fact_transaction_lifecycle (accumulating fact)
Description: One row per transaction/process instance that tracks lifecycle milestones useful for SLA and workflow reporting.
| Column Name | Data Type | Description |
|---|---|---|
transaction_lifecycle_sk |
bigint (PK) | Surrogate key. |
transaction_id |
varchar | Natural txn id (links back to fact_transaction). |
account_sk |
bigint (FK) | FK to dim_account. |
customer_sk |
bigint (FK) | FK to dim_customer. |
initiated_ts |
timestamp | When process initiated. |
authorized_ts |
timestamp | Authorization timestamp. |
posted_ts |
timestamp | Posting timestamp. |
settled_ts |
timestamp | Settlement timestamp. |
dispute_opened_ts |
timestamp | When dispute opened (if any). |
dispute_resolved_ts |
timestamp | When dispute resolved (if any). |
current_status |
varchar | Current lifecycle status. |
owner_queue |
varchar | Current processing owner/queue. |
total_amount |
decimal(18,2) | Total amount involved. |
currency |
varchar(3) | Currency code. |
last_updated_ts |
timestamp | Last update to this lifecycle row. |
etl_batch_id |
varchar | ETL batch id for updates. |
source_system |
varchar | Source system identifier. |
Notes:
- Upsert pattern: insert on process start; update milestone columns as events occur. Use conditional updates to avoid overwriting with stale data.
- Include
last_updated_tsandetl_batch_idto support idempotency and reconciliation.
2.2.8 fact_monthly_account_activity (monthly snapshot/fact)
Description: Monthly aggregated rows per account for fast reporting and statement generation. Treated as a monthly fact (recomputed or incrementally maintained).
| Column Name | Data Type | Description |
|---|---|---|
account_sk |
bigint | FK to dim_account. |
year_month |
varchar(7) | YYYY-MM representation for partitioning. |
opening_balance |
decimal(18,2) | Balance at start of month. |
closing_balance |
decimal(18,2) | Balance at end of month. |
total_credits |
decimal(18,2) | Sum of credits in month. |
total_debits |
decimal(18,2) | Sum of debits in month. |
avg_daily_balance |
decimal(18,2) | Average daily balance over the month. |
num_txns |
integer | Number of transactions in the month. |
num_disputes |
integer | Number of disputes opened in the month. |
snapshot_ts |
timestamp | When snapshot was generated. |
etl_batch_id |
varchar | ETL batch id used to compute this row. |
source_system |
varchar | Source system identifier. |
Notes:
- Partition by
year_month. Recompute recent months on a schedule to capture late-posting transactions; either full-month recompute or incremental adjustments.
2.2.9 dim_date (optional)
Description: Calendar/date dimension to support reporting and easy joins.
| Column Name | Data Type | Description |
|---|---|---|
date_sk |
int (PK) | Surrogate date key (YYYYMMDD integer or sequential id). |
date |
date | Actual date. |
day |
int | Day of month. |
month |
int | Month number. |
year |
int | Year. |
quarter |
int | Quarter of year. |
is_business_day |
boolean | Flag for business day. |
Notes:
- Typical supporting dimension for time-based reporting and partition pruning.
3. Schema diagram
4. Implementation details — FR → tables
This section maps each functional requirement (FR) to the tables and describes the concrete implementation patterns used to meet the requirement.
4.1 FR1 — Onboard accounts and customers (auditability & point-in-time)
- Tables used:
dim_customer(SCD-II),dim_account(SCD-II). - How it’s done:
- Use SCD-II versioning: insert a new version row on any attribute change and close the previous row with
effective_toandis_current = false. - Store
etl_loaded_ts,etl_batch_id, andsource_systemfor each version row for full traceability. - Maintain
version_numberand indexes oncustomer_id, is_currentandaccount_id, is_currentfor fast operational lookups. - Support point-in-time joins by matching a fact’s timestamp between
effective_fromandeffective_toon the appropriate dimension.
- Use SCD-II versioning: insert a new version row on any attribute change and close the previous row with
4.2 FR2 — Capture every financial event (ledger accuracy, fraud, reconciliation)
- Tables used:
fact_transactionplus enrichment viadim_account,dim_customer,dim_merchant,dim_channel,dim_card. - How it’s done:
- Ingest transactions as append-only rows into
fact_transaction. - Use deterministic dedupe key (
transaction_id|source_system) andetl_batch_idto ensure idempotency and avoid duplicates. - Keep both
transaction_ts(event time) andposting_ts(posting/settlement time) to support latency-aware analytics and settle-time reporting. - Enrich facts by joining to dimensions (use point-in-time join for SCD-II dims when analyzing historical behavior).
- Partition
fact_transactionbytransaction_tsand index onaccount_sk,customer_sk, andtransaction_tsfor performance.
- Ingest transactions as append-only rows into
4.3 FR3 — Track posting, settlement, dispute lifecycle and SLA
- Tables used:
fact_transaction_lifecycle(accumulating fact),fact_transactionfor event details. - How it’s done:
- Use an accumulating fact
fact_transaction_lifecyclewhere each process/transaction has a single row with milestone columns (initiated_ts,authorized_ts,posted_ts,settled_ts,dispute_opened_ts,dispute_resolved_ts). - Upsert behavior: insert when the process starts, update milestone columns as events arrive; use
last_updated_tsto prevent stale overwrites and conditional updates only when incoming milestone > existing milestone. - Use
fact_transactionevents to populate or validate lifecycle milestones and to provide audit trail/detail for any lifecycle step. - Build views or materialized aggregates for SLA reporting (e.g., avg time from
dispute_opened_tstodispute_resolved_ts).
- Use an accumulating fact
4.4 FR4 — Produce monthly statements and KPI snapshots tolerant of late-posts
- Tables used:
fact_monthly_account_activity(monthly fact/snapshot), derived fromfact_transactionand balances. - How it’s done:
- Compute
fact_monthly_account_activityperaccount_sk×year_montheither via full-month recompute or incremental windowed updates. - To handle late posting, schedule nightly recompute for recent months (e.g., last 3 months) and full recompute for closed historical months as needed.
- Store
etl_batch_idon the monthly rows and keepsnapshot_tsso consumers know when the row was produced. - Partition monthly rows by
year_monthfor performance; use incremental adjustments where feasible to avoid full recompute.
- Compute
4.5 FR5 — Fast operational lookups for current attributes (KYC, account status)
- Tables used:
dim_customer(SCD-II),dim_account(SCD-II), optionally materialized views foris_currentrows. - How it’s done:
- Keep
is_currentboolean on SCD-II dims and index(natural_id, is_current)for O(1) current lookups. - Optionally maintain small
dim_customer_current/dim_account_currentmaterialized views (or filtered physical tables) containing onlyis_current = truerows for the fastest operational queries. - Enforce constraints in ETL so that at most one
is_current = truerow exists per natural key.
- Keep
4.6 FR6 — Auditability and traceability for compliance
- Tables used: all tables include audit columns; reconciliation layers (views) built from natural keys.
- How it’s done:
- Include
etl_loaded_ts,etl_batch_id, andsource_systemon every table row to trace back to source extracts. - Keep original natural keys (
customer_id,account_id,transaction_id) alongside surrogate keys for reconciliation. - Build reconciliation views and incremental checks (row counts, sums by partition) comparing source extracts to warehouse loads.
- Retain audit history according to policy (e.g., keep SCD-II history indefinitely, archive old fact partitions to cheaper storage but keep monthly facts for reporting).
- Include
4.7 Cross-cutting implementation notes
- Point-in-time joins: when reporting as of a date, join
fact_transaction.transaction_ts(or other event timestamp) to SCD-II dims by requiringevent_ts between effective_from and effective_to(oreffective_to IS NULLfor current). - Idempotency: use deterministic keys and
etl_batch_idto make inserts idempotent; use upsert logic for lifecycle and SCD-II insertion. - Partitioning & retention: partition facts by date for pruning; archive older partitions but preserve monthly aggregates as needed for reporting.
- Monitoring & reconciliation: run scheduled reconciliation jobs that compare source counts/amounts to
fact_transactionaggregates and surface mismatches.
5. Conclusion
This schema design order (requirements → conceptual model → implementation details → physical model) ensures the data model is driven by the business needs: keep history where audits and point-in-time are required (SCD-II for customers and accounts), capture full transaction detail for ledger and fraud, use accumulating facts for lifecycle/SLA tracking, and snapshots for performant reporting. The ETL patterns (SCD-II versioning, append-only transactions, upserts for lifecycle, scheduled snapshot recompute) deliver traceability, accuracy, and operational performance.
© 2026 atanuconsulting.in