Development Guide

Environment Setup

Python

Python 3.9+ is required. The project has been tested with Python 3.9.6.

Virtual environment

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Environment variables

Copy .env.example to .env and fill in your credentials:

cp .env.example .env

The .env file is local-only and excluded by .gitignore. See the README for the full list of variables.

Layered .env loading: __main__.py loads the repository root .env first (shared BC OAuth2 credentials used across all extensions), then the local bc_dev_tools/.env with override=True. This means you only need to set bc_dev_tools-specific variables in the local file; shared credentials are inherited from the root.

Shared tools directory

The package depends on API clients in the repository's tools/ directory:

bc_odata_client.py
faq_api_client.py
csm_1_api_client.py

By default, __main__.py resolves this as <repo-root>/tools/. If your checkout has a different structure, set the BC_TOOLS_DIR environment variable to the directory containing these files.

At startup, __main__.py inserts the tools directory into sys.path so that downstream modules can import the clients directly (e.g., from csm_1_api_client import Csm1ApiClient).

Module Reference

Module	Description
`__init__.py`	Package marker (empty)
`__main__.py`	Entry point: loads `.env` files, configures `sys.path`, delegates to `cli.main()`
`cli.py`	Argparse CLI with three subcommand groups (top-level, import, resolve); dispatches to handler functions
`main.py`	Interactive menu loop, orchestration functions (`complete_pipeline`, `import_faq_api_with_normalization`, etc.), and non-interactive CLI wrappers
`config.py`	Reads all configuration from environment variables; defines table names, column mappings, thresholds, and file paths
`database_manager.py`	`DatabaseManager` class: SQLite connection management, `import_from_excel`, `import_from_api`, `create_table_from_dataframe`, `get_table_data`, `execute_query`, `add_columns_to_table`
`faqbc_data_matcher.py`	`FaqBcDataMatcher` class: multi-column fuzzy matching using `difflib.SequenceMatcher` with configurable thresholds
`exporter.py`	`ResultsExporter` class: writes comparison results to multi-sheet Excel workbooks via `pandas.ExcelWriter` and `openpyxl`
`bc_data_normalizer.py`	`BcDataNormalizer` class: breaks flat BC data into dimension tables (`bc_fault_areas`, `bc_symptom_codes`, `bc_fault_codes_dim`, `bc_descriptions`) and creates SQL views
`faq_data_normalizer.py`	`FaqDataNormalizer` class: breaks flat FAQ data into dimension tables (`faq_main_categories`, `faq_sub_categories`, `faq_issue_types`, `faq_item_parts`) and creates SQL views
`failed_csm_resolver.py`	End-to-end resolver pipeline: fetches from BC OData, matches against FAQ hierarchy, writes spreadsheet, optionally submits to CSM and updates BC. Has its own argparse for standalone execution.
`analyze_resolver.py`	Reads the resolver spreadsheet and prints statistical analysis of output patterns
`analyze_human_corrections.py`	Reads the resolver spreadsheet and analyzes manual corrections made by operators
`test_faq_auth.py`	Standalone debug script for testing FAQ API authentication and token retrieval

Dual-Mode Execution

main.py supports two import strategies:

try:
    from bc_dev_tools.config import ...
except ImportError:
    from config import ...

Package mode (python -m bc_dev_tools): The preferred invocation. __main__.py handles .env loading and sys.path setup before any module code runs.
Standalone mode (python main.py from within the bc_dev_tools/ directory): Falls back to relative imports. You must manually ensure environment variables are set and that the tools/ directory is on your PYTHONPATH. This mode exists for backwards compatibility.

Database

The SQLite database defaults to csm_data.db in the working directory. Override with the CSM_DB_NAME environment variable.

DatabaseManager is the central interface for all database operations:

import_from_excel(file_path, table_name) — reads an Excel file into a DataFrame and creates/replaces a SQLite table
import_from_api() — calls the FAQ API client and stores results
create_table_from_dataframe(df, table_name) — creates a table from any pandas DataFrame
get_table_data(table_name) — returns table contents as a DataFrame
execute_query(sql) — runs arbitrary SQL and returns results
add_columns_to_table(table_name, columns) — adds new columns to an existing table
list_all_tables() — returns all table names in the database
import_multiple_excel_files(file_list) — batch import

Adding New Imports

To import a new Excel file into SQLite:

(Optional) Add the filename to OTHER_EXCEL_FILES in config.py so it is picked up by import excel without flags.
Call DatabaseManager.import_from_excel(file_path, table_name=...) from your handler function.

Adding New Subcommands

The CLI is defined in cli.py. To add a new subcommand:

Add the argparse definition in main() under the appropriate group (top-level, import, or resolve).
Add a dispatch branch in the if/elif chain at the bottom of main().
Create a _cmd_<name> handler function that lazy-imports the target module and calls the appropriate function.
Implement the handler function in the target module (typically main.py for non-interactive wrappers).

The lazy import pattern keeps --help fast and avoids import errors when credentials are missing:

def _cmd_my_command(args: argparse.Namespace) -> None:
    from bc_dev_tools.main import my_function
    my_function(param=args.param)

Matching Logic

Matching lives in faqbc_data_matcher.py and uses:

COLUMN_MAPPING from config.py — maps BC columns to FAQ columns
difflib.SequenceMatcher — computes similarity ratios for each column pair

If you change Excel headers or API response fields, update COLUMN_MAPPING in config.py and keep the documentation in sync.

Exporting

Export logic lives in exporter.py and uses openpyxl via pandas.ExcelWriter. The exported workbook contains three sheets: All Results, Matched, and Unmatched.

Logging

Avoid logging secrets or tokens.
Use HTTP_DEBUG=1 only when debugging API connectivity or protocol issues. It enables http.client debug output and urllib3 DEBUG-level logging.
The python-dotenv package is a runtime dependency — it loads .env files at startup in __main__.py.

Dependencies

All dependencies are pinned in requirements.txt. Key packages:

Package	Purpose
`pandas`	DataFrame operations, Excel reading/writing
`openpyxl`	Excel file format support for pandas
`numpy`	Numeric operations, used by pandas and data processing
`requests`	HTTP client for API calls
`python-dotenv`	`.env` file loading
`phonenumbers`	Phone number parsing (used in data normalization)

Environment Setup​

Python​

Virtual environment​

Environment variables​

Shared tools directory​

Module Reference​

Dual-Mode Execution​

Database​

Adding New Imports​

Adding New Subcommands​

Matching Logic​

Exporting​

Logging​

Dependencies​