Investigation: Customer Table Lock Contention & Performance Degradation
BC Dialing Application — v2.0.0.12 through v2.3.0.0
| Field | Value |
|---|---|
| Extension | BC Dialing Application (Cambay Solutions) |
| Period | February 26 – March 6, 2026 |
| Severity | Critical — blocking CS agent workflow |
| Status | Root cause identified and resolved in v2.3.0.0 |
| Versions Involved | v2.0.0.12 (original), v2.1.0.0, v2.2.0.0, v2.2.1.0, v2.3.0.0 |
Executive Summary
CS agents reported intermittent failures when saving Customer records in Business Central. The error — "The information on this page is out of date" — appeared during business hours and correlated with the Nextiva Integration Job Queue processing in the background. Investigation revealed multiple compounding issues in the original extension code: a single long-running database transaction that held Customer table locks for minutes at a time, an overly broad table lock introduced as a fix that made things worse, a permissions gap that caused 65-77% of API calls to fail after a subsequent fix, and — most critically — database writes interleaved with HTTP I/O that held locks for 15-40 seconds per record even after the transaction was broken up.
Four releases over nine days addressed these issues iteratively, with each version revealing the next layer of the problem. The final fix (v2.3.0.0) restructured the core Job Queue codeunit into a two-phase architecture that completes all network I/O before any database write, reducing lock duration from 15-40 seconds to under 1 second per record.
Timeline
February 26, 2026 — v2.1.0.0: Transaction Isolation
What was reported: CS agents could not save changes to Customer records. Clicking Save produced a lock timeout error. The issue was intermittent but frequent, and always during business hours.
What was found: The Nextiva Integration Job Queue codeunit (CU-60003) processed all unsynced phone and email log records in a single database transaction. For each record, it made multiple HTTP calls to the Nextiva API, uploaded files to Azure Blob Storage, wrote Record Links to Customer records, and updated processing flags — all within one atomic operation spanning the entire batch. Database locks acquired at the beginning of the batch were held until the entire batch completed. With dozens of pending records and several seconds of network latency per HTTP call, the total transaction duration could run to minutes. Any CS agent editing a Customer during that window was blocked.
What was fixed:
- Added
Commit()after each record iteration to break the single transaction into micro-transactions (one per record) - Added
TryFunctionwrappers around per-record processing for error isolation — one failed record no longer aborted the entire batch - Fixed a copy-paste bug in email log error logging that referenced phone log variables
- Added Application Insights telemetry configuration to
app.jsonfor extension-level monitoring - Added
LockTable()on API Pages 80000 (ReceivePhoneNumber) and 80003 (ReceiveEmail) as a concurrency guard against duplicate Customer creation
Assessment: The Commit() per iteration was the right direction — it limited the lock window between records. However, it did not address the lock duration within each record, where HTTP I/O and database writes were still interleaved. The LockTable() addition created a new problem (see next section).
March 4, 2026 — v2.2.0.0: LockTable Removal & Data Integrity Fixes
What was reported: After v2.1.0.0 deployed, agents continued to report "information on this page is out of date" errors. The pattern appeared worse during moderate call volume — suggesting the API pages were involved, not just the Job Queue.
What was found: The LockTable() added in v2.1.0.0 on the API pages was acquiring an exclusive lock on the entire Customer table for every incoming phone call and email. While each lock was held for only milliseconds, the volume of incoming calls created enough serialization pressure to block CS agents from saving Customer records. The cure was worse than the disease.
Investigation also revealed six additional data integrity and performance issues in the original extension code that predated v2.1.0.0:
Insert()without triggers — API pages usedInsert()instead ofInsert(true), skipping No. Series validation, default fields, and Contact creation- No Name on new customers — API-created Customer records had blank Name fields, making them unidentifiable in lists
SetLoadFieldsbleed-through — the same record variable was used for read-only lookup and Customer creation, causingSetLoadFieldsconstraints to persist into Insert and template application (risking partial field loading)- Customer No. not populated on log records — phone/email log records for existing customers had blank Customer No. fields because the assignment only existed in the new-customer branch
- EndCall Customer No. discarded — Page 80001 performed a Customer lookup but discarded the
FindFirst()result - Job Queue index misalignment —
SetCurrentKey("No.")overrode BC's index selection, preventing the newly added ProcessingStatus secondary keys from being used
What was fixed (14 changes total):
- Removed
LockTable()from Pages 80000 and 80003 - Changed
Insert()toInsert(true)for proper BC customer setup - Added
Name := 'New Customer'placeholder - Separated read-only lookup and creation into distinct record variables to prevent
SetLoadFieldsbleed-through - Populated
Customer No.on log records in all three API pages - Switched
SetCurrentKeyto align with the ProcessingStatus secondary keys - Eliminated duplicate
FindFirst()inAddCustomerBlobLink - Added
SetLoadFieldsto all read-only Customer lookups - Added ProcessingStatus secondary keys on CustomerPhoneLog and CustomerEmailLog
- Added Phone No. and E-Mail secondary indexes on the Customer table via table extension
- Re-applied phone/email after template application as a defensive measure
Assessment: Removing LockTable() was correct and necessary. The secondary indexes and SetLoadFields optimizations improved query performance. However, the core locking issue within each Job Queue iteration — database writes interleaved with HTTP I/O — remained unaddressed. The Insert(true) fix was correct but introduced a new permissions requirement (see next section).
March 5, 2026 — v2.2.1.0: Permission Fix & Transaction Scope Alignment
What was reported: Within hours of deploying v2.2.0.0, incoming phone calls and emails began failing at a high rate. Application Insights telemetry showed 65% of ReceivePhoneNumber requests and 77% of ReceiveEmail requests returning HTTP 403.
What was found: The Insert(true) change in v2.2.0.0 correctly fired the standard BC OnInsert trigger — but that trigger chain accessed 13 standard BC tables (No. Series, Contact, Business Relation, Dimension Value, etc.) that the API pages did not have permissions for. Every API call that reached the Customer creation path failed with NavPermissionException.
Telemetry evidence (8 hours post-deployment, 11:25–19:25 UTC):
| Page | Successes (201) | Permission Denied (403) | Lock Timeout (409) | Failure Rate |
|---|---|---|---|---|
| 80000 (ReceivePhoneNumber) | 74 | 138 | 1 | 65% |
| 80003 (ReceiveEmail) | 13 | 43 | 1 | 77% |
| 80001 (EndCall) | 112 | 0 | 0 | 0% |
| 80004 (EndMailCall) | 21 | 0 | 0 | 0% |
EndCall and EndMailCall were unaffected because they don't create Customer records.
Additionally, a SourceTableTemporary inconsistency was identified. Page 80000 (phone) had SourceTableTemporary = true, narrowing its transaction scope. Page 80003 (email) had SourceTableTemporary = false, meaning the CustomerEmailLog insert and Customer creation ran in the same transaction — widening the lock window. Two HTTP 409 errors at 16:42 UTC with exactly 30.5 seconds execution each (BC's lock timeout threshold) were consistent with this wider transaction scope.
What was fixed:
- Added inline Permissions for all 13 standard BC tables accessed by
Insert(true)andApplyCustomerTemplate - Set
SourceTableTemporary = trueon Page 80003 to match Page 80000
Assessment: This was a regression fix for v2.2.0.0. The permission gap was significant — over 180 calls failed in 8 hours. The fix was straightforward once diagnosed. Telemetry was critical to identifying the scope and pattern of the failure.
March 6, 2026 — v2.3.0.0: Root Cause Resolution
What was reported: Even after v2.2.1.0, CS agents continued to report intermittent locking issues. CU-60003 Job Queue execution times in Application Insights ranged from 12 seconds to 3 minutes 47 seconds, trending upward through the day.
What was found — the actual root cause: A line-by-line analysis of CU-60003's processing flow revealed that database writes were interleaved with HTTP I/O within each record iteration. The Commit() per iteration (v2.1.0.0) released locks between records, but within each record:
SaveSummarycalled the Nextiva API, uploaded HTML to Azure Blob, then wrote a Record Link to the Customer record (first DB lock acquired)SaveRecordingcalled the Nextiva API, downloaded an MP3, uploaded to Azure Blob, wrote another Record Link (locks from step 1 still held)SaveTranscriptcalled the Nextiva API, uploaded HTML to Azure Blob, wrote another Record Link (locks from steps 1 and 2 still held)- Processing flags updated (locks finally released at Commit)
The first RecRef.AddLink() call in SaveSummary acquired a write lock on the Record Link system table and a read lock on the Customer record. All subsequent HTTP calls, blob uploads, and database writes in SaveRecording and SaveTranscript ran under those locks. With network latency, Nextiva API response times, and Azure Blob upload times, the lock window was 15-40 seconds per record.
This explained why the issue persisted despite breaking into micro-transactions — each micro-transaction still held locks for the duration of its HTTP I/O.
Additional issues found during this analysis:
- 4 redundant Customer lookups per record — each Save* procedure performed its own Customer FindFirst by phone number
- 4 redundant Modify calls per record — flag updates performed as separate Get+Modify pairs
- Hardcoded credentials — Nextiva Login URL, AuthText, Azure Blob SAS token embedded as string constants
- Manual Sync regression — the date-based path called legacy Save* stubs that only performed Phase 1, producing uploads without Record Links or flag updates
- Page 80004 (EndMailCall) empty trigger — email end-of-call events created orphaned records with blank Customer No.
- CU-60000 (legacy) missing SetLoadFields, Modify, and Permissions — same bugs fixed in Pages 80000/80003 in earlier versions
What was fixed (13 changes):
The central fix was restructuring CU-60003 into a two-phase architecture:
-
Phase 1 — FetchAndUpload (all HTTP I/O, zero database writes): Three new procedures (
FetchAndUploadSummary,FetchAndUploadRecording,FetchAndUploadTranscript) handle all Nextiva API calls and Azure Blob uploads. Each returns a blob URL in avarparameter. No database tables are touched — no locks are acquired. -
Phase 2 — Write (all database operations, zero I/O): Two new procedures handle all writes in a tight batch:
WriteBlobLinksToCustomer— a single Customer Get by primary key, then up to 3 RecRef.AddLink() callsUpdateAllPhoneLogFlags— a single Get + single Modify that sets all three boolean flags
The restructured flow per record:
Phase 1: FetchAndUploadSummary → SummaryUrl (HTTP + blob, no DB)
FetchAndUploadRecording → RecordingUrl (HTTP + blob, no DB)
FetchAndUploadTranscript → TranscriptUrl (HTTP + blob, no DB)
Phase 2: WriteBlobLinksToCustomer(...) (DB writes only, <1s)
UpdateAllPhoneLogFlags(...) (DB writes only)
Commit() (release locks)
Impact: Lock duration reduced from 15-40 seconds to under 1 second per record. The Commit() per iteration continues to release locks between records.
Additional fixes:
- Single Customer lookup per record (4 → 1)
- Single Modify per record for processing flags (4 → 1)
- All credentials externalized to NextivaConfig table
- Upgrade codeunit (CU-60005) auto-populates config during deployment
- Manual Sync date-based path rewritten to call Phase 1 + Phase 2 directly
- Legacy Save* stubs removed from CU-60003 (dead code)
- EndMailCall trigger populated with Customer email lookup
- CU-60000 legacy codeunit patched with SetLoadFields, Modify, and Permissions
- BCDialerPermissions updated with new objects
Root Cause Analysis
The root cause was a fundamental architectural flaw in the original CU-60003 design: database writes interleaved with unbounded HTTP I/O within a single transaction scope. Even after breaking the batch into micro-transactions (v2.1.0.0), each micro-transaction still performed multiple HTTP calls and blob uploads while holding database locks. The lock duration was governed by external network latency — something that varies and can't be bounded.
The iterative fix history illustrates how compounding issues can mask the root cause:
| Version | What it fixed | What it missed |
|---|---|---|
| v2.1.0.0 | Lock duration between records (Commit per iteration) | Lock duration within each record (interleaved I/O and DB writes) |
| v2.1.0.0 | — | Introduced LockTable() that made API page contention worse |
| v2.2.0.0 | LockTable removal, data integrity bugs, query optimizations | Insert(true) introduced a permissions requirement |
| v2.2.1.0 | Permissions gap, transaction scope inconsistency | Still didn't address the core interleaved I/O pattern |
| v2.3.0.0 | The actual root cause — separated all I/O from all DB writes | — |
The fundamental lesson: when database locks and network I/O share a transaction scope, lock duration becomes a function of network latency. The only reliable fix is to complete all I/O before acquiring any locks.
Impact Assessment
CS Agent Impact
- Before fixes (v2.0.0.12): Customer saves intermittently blocked during business hours whenever the Job Queue was processing. Lock timeouts of 30+ seconds caused work to be lost.
- v2.1.0.0: Reduced contention between records but introduced API-page-level table locking.
- v2.2.0.0: Removed table locking and improved data integrity. API page permission failure rate was masked until the fix was deployed.
- v2.2.1.0: 65-77% of incoming calls/emails failing with HTTP 403 for 8 hours post-deployment. Fixed within 24 hours.
- v2.3.0.0 (expected): Lock duration under 1 second per record. No table-level locks. CS agents should experience zero lock contention from the dialer integration during normal operation.
Data Integrity Impact
Issues discovered and fixed across the investigation:
- Blank-name Customer records created by API pages (v2.0.0.12, fixed v2.2.0.0)
- Customer No. not populated on phone/email log records for existing customers (v2.0.0.12, fixed v2.2.0.0)
- EndCall log records missing Customer No. (v2.0.0.12, fixed v2.2.0.0)
- EndMailCall log records missing Customer No. (v2.0.0.12, fixed v2.3.0.0)
- No. Series validation skipped on new customers (v2.0.0.12, fixed v2.2.0.0)
- Contact records not created for API-generated customers (v2.0.0.12, fixed v2.2.0.0)
- ~181 failed API calls during v2.2.0.0 permission gap window (March 4-5, may have left partial state)
Performance Metrics
| Metric | Before (v2.0.0.12) | After v2.1.0.0 | After v2.3.0.0 (expected) |
|---|---|---|---|
| Lock duration per record | Full batch (minutes) | 15-40 seconds | < 1 second |
| Customer lookups per record | 4 (by phone number) | 4 | 1 (by primary key) |
| DB Modifies per record | 4+ | 4+ | 2 |
| Job Queue cycle time | Minutes | 12s – 3m 47s | < 5 seconds (target) |
Telemetry & Evidence
Application Insights telemetry was added in v2.1.0.0 and proved critical throughout the investigation:
- v2.2.1.0 diagnosis: HTTP status code distribution revealed the 65-77% failure rate within hours of deployment. Without telemetry, this would have been reported anecdotally by agents and taken longer to diagnose.
- Lock timeout correlation: Two HTTP 409 errors at 16:42 UTC with exactly 30.5 seconds execution time confirmed the BC SQL lock timeout threshold and pointed to the SourceTableTemporary inconsistency.
- Job Queue performance trending: CU-60003 execution times (RT0018 telemetry) showed increasing SQL executes (38 → 532) and rows read (646 → 1,440) over 13 consecutive runs, confirming a growing backlog from the 403 failures.
- Root cause timing evidence: Execution time ranges of 12 seconds to 3 minutes 47 seconds per Job Queue cycle, even after v2.2.1.0 fixed the permissions issue, confirmed that the within-record lock duration was the remaining problem.
Remaining Items
Deployed and Verified
- v2.1.0.0 through v2.2.1.0 are deployed to Production
- v2.3.0.0 compiles cleanly (2 pre-existing UseReadySAS deprecation warnings only)
Pending for v2.3.0.0
- Independent code review
- UAT test plan update
- Deployment to sandbox for functional verification
- Production deployment and monitoring
Post-Deployment Monitoring
All hardcoded fallback branches emit Session.LogMessage warnings to Application Insights (event IDs BCDIALER-0001 through BCDIALER-0003). After deployment, query for these events — if none appear, the config-driven migration is working correctly and the fallbacks can be safely replaced with Error() calls in v2.4.0.0.
Follow-Up for v2.4.0.0
- Remove all hardcoded credential fallbacks — replace
elsebranches withError()calls (safe to proceed once telemetry confirms zeroBCDIALER-000*events) - Add Azure Storage Account and Container Name fields to NextivaConfig
- Remove seed credentials from CU-60005 upgrade codeunit
- Migrate
UseReadySASfromTexttoSecretTextparameter type
Data Cleanup
- Review and update blank-name Customer records created by API pages prior to v2.2.0.0
- Review phone/email log records created during the v2.2.0.0 permission gap window (March 4-5) for partial state
Related Documents
| Document | Description |
|---|---|
| CHANGE-v2.1.0.0.md | Transaction isolation, TryFunction wrappers, telemetry |
| CHANGE-v2.2.0.0.md | LockTable removal, data integrity fixes, secondary indexes |
| CHANGE-v2.2.1.0.md | Permission fix, SourceTableTemporary alignment, telemetry evidence |
| CHANGE-v2.3.0.0.md | Phase 1/Phase 2 restructure, credential externalization, config-driven deployment |
| CHANGELOG.md | Summary changelog for all versions |