Investigation: Customer Table Lock Contention & Performance Degradation

BC Dialing Application — v2.0.0.12 through v2.3.0.0

Field	Value
Extension	BC Dialing Application (Cambay Solutions)
Period	February 26 – March 6, 2026
Severity	Critical — blocking CS agent workflow
Status	Root cause identified and resolved in v2.3.0.0
Versions Involved	v2.0.0.12 (original), v2.1.0.0, v2.2.0.0, v2.2.1.0, v2.3.0.0

Executive Summary

CS agents reported intermittent failures when saving Customer records in Business Central. The error — "The information on this page is out of date" — appeared during business hours and correlated with the Nextiva Integration Job Queue processing in the background. Investigation revealed multiple compounding issues in the original extension code: a single long-running database transaction that held Customer table locks for minutes at a time, an overly broad table lock introduced as a fix that made things worse, a permissions gap that caused 65-77% of API calls to fail after a subsequent fix, and — most critically — database writes interleaved with HTTP I/O that held locks for 15-40 seconds per record even after the transaction was broken up.

Four releases over nine days addressed these issues iteratively, with each version revealing the next layer of the problem. The final fix (v2.3.0.0) restructured the core Job Queue codeunit into a two-phase architecture that completes all network I/O before any database write, reducing lock duration from 15-40 seconds to under 1 second per record.

Timeline

February 26, 2026 — v2.1.0.0: Transaction Isolation

What was reported: CS agents could not save changes to Customer records. Clicking Save produced a lock timeout error. The issue was intermittent but frequent, and always during business hours.

What was found: The Nextiva Integration Job Queue codeunit (CU-60003) processed all unsynced phone and email log records in a single database transaction. For each record, it made multiple HTTP calls to the Nextiva API, uploaded files to Azure Blob Storage, wrote Record Links to Customer records, and updated processing flags — all within one atomic operation spanning the entire batch. Database locks acquired at the beginning of the batch were held until the entire batch completed. With dozens of pending records and several seconds of network latency per HTTP call, the total transaction duration could run to minutes. Any CS agent editing a Customer during that window was blocked.

What was fixed:

Added Commit() after each record iteration to break the single transaction into micro-transactions (one per record)
Added TryFunction wrappers around per-record processing for error isolation — one failed record no longer aborted the entire batch
Fixed a copy-paste bug in email log error logging that referenced phone log variables
Added Application Insights telemetry configuration to app.json for extension-level monitoring
Added LockTable() on API Pages 80000 (ReceivePhoneNumber) and 80003 (ReceiveEmail) as a concurrency guard against duplicate Customer creation

Assessment: The Commit() per iteration was the right direction — it limited the lock window between records. However, it did not address the lock duration within each record, where HTTP I/O and database writes were still interleaved. The LockTable() addition created a new problem (see next section).

March 4, 2026 — v2.2.0.0: LockTable Removal & Data Integrity Fixes

What was reported: After v2.1.0.0 deployed, agents continued to report "information on this page is out of date" errors. The pattern appeared worse during moderate call volume — suggesting the API pages were involved, not just the Job Queue.

What was found: The LockTable() added in v2.1.0.0 on the API pages was acquiring an exclusive lock on the entire Customer table for every incoming phone call and email. While each lock was held for only milliseconds, the volume of incoming calls created enough serialization pressure to block CS agents from saving Customer records. The cure was worse than the disease.

Investigation also revealed six additional data integrity and performance issues in the original extension code that predated v2.1.0.0:

Insert() without triggers — API pages used Insert() instead of Insert(true), skipping No. Series validation, default fields, and Contact creation
No Name on new customers — API-created Customer records had blank Name fields, making them unidentifiable in lists
SetLoadFields bleed-through — the same record variable was used for read-only lookup and Customer creation, causing SetLoadFields constraints to persist into Insert and template application (risking partial field loading)
Customer No. not populated on log records — phone/email log records for existing customers had blank Customer No. fields because the assignment only existed in the new-customer branch
EndCall Customer No. discarded — Page 80001 performed a Customer lookup but discarded the FindFirst() result
Job Queue index misalignment — SetCurrentKey("No.") overrode BC's index selection, preventing the newly added ProcessingStatus secondary keys from being used

What was fixed (14 changes total):

Removed LockTable() from Pages 80000 and 80003
Changed Insert() to Insert(true) for proper BC customer setup
Added Name := 'New Customer' placeholder
Separated read-only lookup and creation into distinct record variables to prevent SetLoadFields bleed-through
Populated Customer No. on log records in all three API pages
Switched SetCurrentKey to align with the ProcessingStatus secondary keys
Eliminated duplicate FindFirst() in AddCustomerBlobLink
Added SetLoadFields to all read-only Customer lookups
Added ProcessingStatus secondary keys on CustomerPhoneLog and CustomerEmailLog
Added Phone No. and E-Mail secondary indexes on the Customer table via table extension
Re-applied phone/email after template application as a defensive measure

Assessment: Removing LockTable() was correct and necessary. The secondary indexes and SetLoadFields optimizations improved query performance. However, the core locking issue within each Job Queue iteration — database writes interleaved with HTTP I/O — remained unaddressed. The Insert(true) fix was correct but introduced a new permissions requirement (see next section).

March 5, 2026 — v2.2.1.0: Permission Fix & Transaction Scope Alignment

What was reported: Within hours of deploying v2.2.0.0, incoming phone calls and emails began failing at a high rate. Application Insights telemetry showed 65% of ReceivePhoneNumber requests and 77% of ReceiveEmail requests returning HTTP 403.

What was found: The Insert(true) change in v2.2.0.0 correctly fired the standard BC OnInsert trigger — but that trigger chain accessed 13 standard BC tables (No. Series, Contact, Business Relation, Dimension Value, etc.) that the API pages did not have permissions for. Every API call that reached the Customer creation path failed with NavPermissionException.

Telemetry evidence (8 hours post-deployment, 11:25–19:25 UTC):

Page	Successes (201)	Permission Denied (403)	Lock Timeout (409)	Failure Rate
80000 (ReceivePhoneNumber)	74	138	1	65%
80003 (ReceiveEmail)	13	43	1	77%
80001 (EndCall)	112	0	0	0%
80004 (EndMailCall)	21	0	0	0%

EndCall and EndMailCall were unaffected because they don't create Customer records.

Additionally, a SourceTableTemporary inconsistency was identified. Page 80000 (phone) had SourceTableTemporary = true, narrowing its transaction scope. Page 80003 (email) had SourceTableTemporary = false, meaning the CustomerEmailLog insert and Customer creation ran in the same transaction — widening the lock window. Two HTTP 409 errors at 16:42 UTC with exactly 30.5 seconds execution each (BC's lock timeout threshold) were consistent with this wider transaction scope.

What was fixed:

Added inline Permissions for all 13 standard BC tables accessed by Insert(true) and ApplyCustomerTemplate
Set SourceTableTemporary = true on Page 80003 to match Page 80000

Assessment: This was a regression fix for v2.2.0.0. The permission gap was significant — over 180 calls failed in 8 hours. The fix was straightforward once diagnosed. Telemetry was critical to identifying the scope and pattern of the failure.

March 6, 2026 — v2.3.0.0: Root Cause Resolution

What was reported: Even after v2.2.1.0, CS agents continued to report intermittent locking issues. CU-60003 Job Queue execution times in Application Insights ranged from 12 seconds to 3 minutes 47 seconds, trending upward through the day.

What was found — the actual root cause: A line-by-line analysis of CU-60003's processing flow revealed that database writes were interleaved with HTTP I/O within each record iteration. The Commit() per iteration (v2.1.0.0) released locks between records, but within each record:

SaveSummary called the Nextiva API, uploaded HTML to Azure Blob, then wrote a Record Link to the Customer record (first DB lock acquired)
SaveRecording called the Nextiva API, downloaded an MP3, uploaded to Azure Blob, wrote another Record Link (locks from step 1 still held)
SaveTranscript called the Nextiva API, uploaded HTML to Azure Blob, wrote another Record Link (locks from steps 1 and 2 still held)
Processing flags updated (locks finally released at Commit)

The first RecRef.AddLink() call in SaveSummary acquired a write lock on the Record Link system table and a read lock on the Customer record. All subsequent HTTP calls, blob uploads, and database writes in SaveRecording and SaveTranscript ran under those locks. With network latency, Nextiva API response times, and Azure Blob upload times, the lock window was 15-40 seconds per record.

This explained why the issue persisted despite breaking into micro-transactions — each micro-transaction still held locks for the duration of its HTTP I/O.

Additional issues found during this analysis:

4 redundant Customer lookups per record — each Save* procedure performed its own Customer FindFirst by phone number
4 redundant Modify calls per record — flag updates performed as separate Get+Modify pairs
Hardcoded credentials — Nextiva Login URL, AuthText, Azure Blob SAS token embedded as string constants
Manual Sync regression — the date-based path called legacy Save* stubs that only performed Phase 1, producing uploads without Record Links or flag updates
Page 80004 (EndMailCall) empty trigger — email end-of-call events created orphaned records with blank Customer No.
CU-60000 (legacy) missing SetLoadFields, Modify, and Permissions — same bugs fixed in Pages 80000/80003 in earlier versions

What was fixed (13 changes):

The central fix was restructuring CU-60003 into a two-phase architecture:

Phase 1 — FetchAndUpload (all HTTP I/O, zero database writes): Three new procedures (FetchAndUploadSummary, FetchAndUploadRecording, FetchAndUploadTranscript) handle all Nextiva API calls and Azure Blob uploads. Each returns a blob URL in a var parameter. No database tables are touched — no locks are acquired.
Phase 2 — Write (all database operations, zero I/O): Two new procedures handle all writes in a tight batch:
- WriteBlobLinksToCustomer — a single Customer Get by primary key, then up to 3 RecRef.AddLink() calls
- UpdateAllPhoneLogFlags — a single Get + single Modify that sets all three boolean flags

The restructured flow per record:

Phase 1: FetchAndUploadSummary  → SummaryUrl       (HTTP + blob, no DB)
         FetchAndUploadRecording → RecordingUrl     (HTTP + blob, no DB)
         FetchAndUploadTranscript → TranscriptUrl   (HTTP + blob, no DB)

Phase 2: WriteBlobLinksToCustomer(...)               (DB writes only, <1s)
         UpdateAllPhoneLogFlags(...)                  (DB writes only)
         Commit()                                     (release locks)

Impact: Lock duration reduced from 15-40 seconds to under 1 second per record. The Commit() per iteration continues to release locks between records.

Additional fixes:

Single Customer lookup per record (4 → 1)
Single Modify per record for processing flags (4 → 1)
All credentials externalized to NextivaConfig table
Upgrade codeunit (CU-60005) auto-populates config during deployment
Manual Sync date-based path rewritten to call Phase 1 + Phase 2 directly
Legacy Save* stubs removed from CU-60003 (dead code)
EndMailCall trigger populated with Customer email lookup
CU-60000 legacy codeunit patched with SetLoadFields, Modify, and Permissions
BCDialerPermissions updated with new objects

Root Cause Analysis

The root cause was a fundamental architectural flaw in the original CU-60003 design: database writes interleaved with unbounded HTTP I/O within a single transaction scope. Even after breaking the batch into micro-transactions (v2.1.0.0), each micro-transaction still performed multiple HTTP calls and blob uploads while holding database locks. The lock duration was governed by external network latency — something that varies and can't be bounded.

The iterative fix history illustrates how compounding issues can mask the root cause:

Version	What it fixed	What it missed
v2.1.0.0	Lock duration between records (Commit per iteration)	Lock duration within each record (interleaved I/O and DB writes)
v2.1.0.0	—	Introduced LockTable() that made API page contention worse
v2.2.0.0	LockTable removal, data integrity bugs, query optimizations	Insert(true) introduced a permissions requirement
v2.2.1.0	Permissions gap, transaction scope inconsistency	Still didn't address the core interleaved I/O pattern
v2.3.0.0	The actual root cause — separated all I/O from all DB writes	—

The fundamental lesson: when database locks and network I/O share a transaction scope, lock duration becomes a function of network latency. The only reliable fix is to complete all I/O before acquiring any locks.

Impact Assessment

CS Agent Impact

Before fixes (v2.0.0.12): Customer saves intermittently blocked during business hours whenever the Job Queue was processing. Lock timeouts of 30+ seconds caused work to be lost.
v2.1.0.0: Reduced contention between records but introduced API-page-level table locking.
v2.2.0.0: Removed table locking and improved data integrity. API page permission failure rate was masked until the fix was deployed.
v2.2.1.0: 65-77% of incoming calls/emails failing with HTTP 403 for 8 hours post-deployment. Fixed within 24 hours.
v2.3.0.0 (expected): Lock duration under 1 second per record. No table-level locks. CS agents should experience zero lock contention from the dialer integration during normal operation.

Data Integrity Impact

Issues discovered and fixed across the investigation:

Blank-name Customer records created by API pages (v2.0.0.12, fixed v2.2.0.0)
Customer No. not populated on phone/email log records for existing customers (v2.0.0.12, fixed v2.2.0.0)
EndCall log records missing Customer No. (v2.0.0.12, fixed v2.2.0.0)
EndMailCall log records missing Customer No. (v2.0.0.12, fixed v2.3.0.0)
No. Series validation skipped on new customers (v2.0.0.12, fixed v2.2.0.0)
Contact records not created for API-generated customers (v2.0.0.12, fixed v2.2.0.0)
~181 failed API calls during v2.2.0.0 permission gap window (March 4-5, may have left partial state)

Performance Metrics

Metric	Before (v2.0.0.12)	After v2.1.0.0	After v2.3.0.0 (expected)
Lock duration per record	Full batch (minutes)	15-40 seconds	< 1 second
Customer lookups per record	4 (by phone number)	4	1 (by primary key)
DB Modifies per record	4+	4+	2
Job Queue cycle time	Minutes	12s – 3m 47s	< 5 seconds (target)

Telemetry & Evidence

Application Insights telemetry was added in v2.1.0.0 and proved critical throughout the investigation:

v2.2.1.0 diagnosis: HTTP status code distribution revealed the 65-77% failure rate within hours of deployment. Without telemetry, this would have been reported anecdotally by agents and taken longer to diagnose.
Lock timeout correlation: Two HTTP 409 errors at 16:42 UTC with exactly 30.5 seconds execution time confirmed the BC SQL lock timeout threshold and pointed to the SourceTableTemporary inconsistency.
Job Queue performance trending: CU-60003 execution times (RT0018 telemetry) showed increasing SQL executes (38 → 532) and rows read (646 → 1,440) over 13 consecutive runs, confirming a growing backlog from the 403 failures.
Root cause timing evidence: Execution time ranges of 12 seconds to 3 minutes 47 seconds per Job Queue cycle, even after v2.2.1.0 fixed the permissions issue, confirmed that the within-record lock duration was the remaining problem.

Remaining Items

Deployed and Verified

v2.1.0.0 through v2.2.1.0 are deployed to Production
v2.3.0.0 compiles cleanly (2 pre-existing UseReadySAS deprecation warnings only)

Pending for v2.3.0.0

Independent code review
UAT test plan update
Deployment to sandbox for functional verification
Production deployment and monitoring

Post-Deployment Monitoring

All hardcoded fallback branches emit Session.LogMessage warnings to Application Insights (event IDs BCDIALER-0001 through BCDIALER-0003). After deployment, query for these events — if none appear, the config-driven migration is working correctly and the fallbacks can be safely replaced with Error() calls in v2.4.0.0.

Follow-Up for v2.4.0.0

Remove all hardcoded credential fallbacks — replace else branches with Error() calls (safe to proceed once telemetry confirms zero BCDIALER-000* events)
Add Azure Storage Account and Container Name fields to NextivaConfig
Remove seed credentials from CU-60005 upgrade codeunit
Migrate UseReadySAS from Text to SecretText parameter type

Data Cleanup

Review and update blank-name Customer records created by API pages prior to v2.2.0.0
Review phone/email log records created during the v2.2.0.0 permission gap window (March 4-5) for partial state

Document	Description
CHANGE-v2.1.0.0.md	Transaction isolation, TryFunction wrappers, telemetry
CHANGE-v2.2.0.0.md	LockTable removal, data integrity fixes, secondary indexes
CHANGE-v2.2.1.0.md	Permission fix, SourceTableTemporary alignment, telemetry evidence
CHANGE-v2.3.0.0.md	Phase 1/Phase 2 restructure, credential externalization, config-driven deployment
CHANGELOG.md	Summary changelog for all versions

BC Dialing Application — v2.0.0.12 through v2.3.0.0​

Executive Summary​

Timeline​

February 26, 2026 — v2.1.0.0: Transaction Isolation​

March 4, 2026 — v2.2.0.0: LockTable Removal & Data Integrity Fixes​

March 5, 2026 — v2.2.1.0: Permission Fix & Transaction Scope Alignment​

March 6, 2026 — v2.3.0.0: Root Cause Resolution​

Root Cause Analysis​

Impact Assessment​

CS Agent Impact​

Data Integrity Impact​

Performance Metrics​

Telemetry & Evidence​

Remaining Items​

Deployed and Verified​

Pending for v2.3.0.0​

Post-Deployment Monitoring​

Follow-Up for v2.4.0.0​

Data Cleanup​

Related Documents​