Skip to main content

Development Guide

Environment Setup

Python

Python 3.9+ is required. The project has been tested with Python 3.9.6.

Virtual environment

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Environment variables

Copy .env.example to .env and fill in your credentials:

cp .env.example .env

The .env file is local-only and excluded by .gitignore. See the README for the full list of variables.

Layered .env loading: __main__.py loads the repository root .env first (shared BC OAuth2 credentials used across all extensions), then the local bc_dev_tools/.env with override=True. This means you only need to set bc_dev_tools-specific variables in the local file; shared credentials are inherited from the root.

Shared tools directory

The package depends on API clients in the repository's tools/ directory:

  • bc_odata_client.py
  • faq_api_client.py
  • csm_1_api_client.py

By default, __main__.py resolves this as <repo-root>/tools/. If your checkout has a different structure, set the BC_TOOLS_DIR environment variable to the directory containing these files.

At startup, __main__.py inserts the tools directory into sys.path so that downstream modules can import the clients directly (e.g., from csm_1_api_client import Csm1ApiClient).

Module Reference

ModuleDescription
__init__.pyPackage marker (empty)
__main__.pyEntry point: loads .env files, configures sys.path, delegates to cli.main()
cli.pyArgparse CLI with three subcommand groups (top-level, import, resolve); dispatches to handler functions
main.pyInteractive menu loop, orchestration functions (complete_pipeline, import_faq_api_with_normalization, etc.), and non-interactive CLI wrappers
config.pyReads all configuration from environment variables; defines table names, column mappings, thresholds, and file paths
database_manager.pyDatabaseManager class: SQLite connection management, import_from_excel, import_from_api, create_table_from_dataframe, get_table_data, execute_query, add_columns_to_table
faqbc_data_matcher.pyFaqBcDataMatcher class: multi-column fuzzy matching using difflib.SequenceMatcher with configurable thresholds
exporter.pyResultsExporter class: writes comparison results to multi-sheet Excel workbooks via pandas.ExcelWriter and openpyxl
bc_data_normalizer.pyBcDataNormalizer class: breaks flat BC data into dimension tables (bc_fault_areas, bc_symptom_codes, bc_fault_codes_dim, bc_descriptions) and creates SQL views
faq_data_normalizer.pyFaqDataNormalizer class: breaks flat FAQ data into dimension tables (faq_main_categories, faq_sub_categories, faq_issue_types, faq_item_parts) and creates SQL views
failed_csm_resolver.pyEnd-to-end resolver pipeline: fetches from BC OData, matches against FAQ hierarchy, writes spreadsheet, optionally submits to CSM and updates BC. Has its own argparse for standalone execution.
analyze_resolver.pyReads the resolver spreadsheet and prints statistical analysis of output patterns
analyze_human_corrections.pyReads the resolver spreadsheet and analyzes manual corrections made by operators
test_faq_auth.pyStandalone debug script for testing FAQ API authentication and token retrieval

Dual-Mode Execution

main.py supports two import strategies:

try:
from bc_dev_tools.config import ...
except ImportError:
from config import ...
  • Package mode (python -m bc_dev_tools): The preferred invocation. __main__.py handles .env loading and sys.path setup before any module code runs.
  • Standalone mode (python main.py from within the bc_dev_tools/ directory): Falls back to relative imports. You must manually ensure environment variables are set and that the tools/ directory is on your PYTHONPATH. This mode exists for backwards compatibility.

Database

The SQLite database defaults to csm_data.db in the working directory. Override with the CSM_DB_NAME environment variable.

DatabaseManager is the central interface for all database operations:

  • import_from_excel(file_path, table_name) — reads an Excel file into a DataFrame and creates/replaces a SQLite table
  • import_from_api() — calls the FAQ API client and stores results
  • create_table_from_dataframe(df, table_name) — creates a table from any pandas DataFrame
  • get_table_data(table_name) — returns table contents as a DataFrame
  • execute_query(sql) — runs arbitrary SQL and returns results
  • add_columns_to_table(table_name, columns) — adds new columns to an existing table
  • list_all_tables() — returns all table names in the database
  • import_multiple_excel_files(file_list) — batch import

Adding New Imports

To import a new Excel file into SQLite:

  1. (Optional) Add the filename to OTHER_EXCEL_FILES in config.py so it is picked up by import excel without flags.
  2. Call DatabaseManager.import_from_excel(file_path, table_name=...) from your handler function.

Adding New Subcommands

The CLI is defined in cli.py. To add a new subcommand:

  1. Add the argparse definition in main() under the appropriate group (top-level, import, or resolve).
  2. Add a dispatch branch in the if/elif chain at the bottom of main().
  3. Create a _cmd_<name> handler function that lazy-imports the target module and calls the appropriate function.
  4. Implement the handler function in the target module (typically main.py for non-interactive wrappers).

The lazy import pattern keeps --help fast and avoids import errors when credentials are missing:

def _cmd_my_command(args: argparse.Namespace) -> None:
from bc_dev_tools.main import my_function
my_function(param=args.param)

Matching Logic

Matching lives in faqbc_data_matcher.py and uses:

  • COLUMN_MAPPING from config.py — maps BC columns to FAQ columns
  • difflib.SequenceMatcher — computes similarity ratios for each column pair

If you change Excel headers or API response fields, update COLUMN_MAPPING in config.py and keep the documentation in sync.

Exporting

Export logic lives in exporter.py and uses openpyxl via pandas.ExcelWriter. The exported workbook contains three sheets: All Results, Matched, and Unmatched.

Logging

  • Avoid logging secrets or tokens.
  • Use HTTP_DEBUG=1 only when debugging API connectivity or protocol issues. It enables http.client debug output and urllib3 DEBUG-level logging.
  • The python-dotenv package is a runtime dependency — it loads .env files at startup in __main__.py.

Dependencies

All dependencies are pinned in requirements.txt. Key packages:

PackagePurpose
pandasDataFrame operations, Excel reading/writing
openpyxlExcel file format support for pandas
numpyNumeric operations, used by pandas and data processing
requestsHTTP client for API calls
python-dotenv.env file loading
phonenumbersPhone number parsing (used in data normalization)