Development Guide
Environment Setup
Python
Python 3.9+ is required. The project has been tested with Python 3.9.6.
Virtual environment
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Environment variables
Copy .env.example to .env and fill in your credentials:
cp .env.example .env
The .env file is local-only and excluded by .gitignore. See the README for the full list of variables.
Layered .env loading: __main__.py loads the repository root .env first (shared BC OAuth2 credentials used across all extensions), then the local bc_dev_tools/.env with override=True. This means you only need to set bc_dev_tools-specific variables in the local file; shared credentials are inherited from the root.
Shared tools directory
The package depends on API clients in the repository's tools/ directory:
bc_odata_client.pyfaq_api_client.pycsm_1_api_client.py
By default, __main__.py resolves this as <repo-root>/tools/. If your checkout has a different structure, set the BC_TOOLS_DIR environment variable to the directory containing these files.
At startup, __main__.py inserts the tools directory into sys.path so that downstream modules can import the clients directly (e.g., from csm_1_api_client import Csm1ApiClient).
Module Reference
| Module | Description |
|---|---|
__init__.py | Package marker (empty) |
__main__.py | Entry point: loads .env files, configures sys.path, delegates to cli.main() |
cli.py | Argparse CLI with three subcommand groups (top-level, import, resolve); dispatches to handler functions |
main.py | Interactive menu loop, orchestration functions (complete_pipeline, import_faq_api_with_normalization, etc.), and non-interactive CLI wrappers |
config.py | Reads all configuration from environment variables; defines table names, column mappings, thresholds, and file paths |
database_manager.py | DatabaseManager class: SQLite connection management, import_from_excel, import_from_api, create_table_from_dataframe, get_table_data, execute_query, add_columns_to_table |
faqbc_data_matcher.py | FaqBcDataMatcher class: multi-column fuzzy matching using difflib.SequenceMatcher with configurable thresholds |
exporter.py | ResultsExporter class: writes comparison results to multi-sheet Excel workbooks via pandas.ExcelWriter and openpyxl |
bc_data_normalizer.py | BcDataNormalizer class: breaks flat BC data into dimension tables (bc_fault_areas, bc_symptom_codes, bc_fault_codes_dim, bc_descriptions) and creates SQL views |
faq_data_normalizer.py | FaqDataNormalizer class: breaks flat FAQ data into dimension tables (faq_main_categories, faq_sub_categories, faq_issue_types, faq_item_parts) and creates SQL views |
failed_csm_resolver.py | End-to-end resolver pipeline: fetches from BC OData, matches against FAQ hierarchy, writes spreadsheet, optionally submits to CSM and updates BC. Has its own argparse for standalone execution. |
analyze_resolver.py | Reads the resolver spreadsheet and prints statistical analysis of output patterns |
analyze_human_corrections.py | Reads the resolver spreadsheet and analyzes manual corrections made by operators |
test_faq_auth.py | Standalone debug script for testing FAQ API authentication and token retrieval |
Dual-Mode Execution
main.py supports two import strategies:
try:
from bc_dev_tools.config import ...
except ImportError:
from config import ...
- Package mode (
python -m bc_dev_tools): The preferred invocation.__main__.pyhandles.envloading andsys.pathsetup before any module code runs. - Standalone mode (
python main.pyfrom within thebc_dev_tools/directory): Falls back to relative imports. You must manually ensure environment variables are set and that thetools/directory is on yourPYTHONPATH. This mode exists for backwards compatibility.
Database
The SQLite database defaults to csm_data.db in the working directory. Override with the CSM_DB_NAME environment variable.
DatabaseManager is the central interface for all database operations:
import_from_excel(file_path, table_name)— reads an Excel file into a DataFrame and creates/replaces a SQLite tableimport_from_api()— calls the FAQ API client and stores resultscreate_table_from_dataframe(df, table_name)— creates a table from any pandas DataFrameget_table_data(table_name)— returns table contents as a DataFrameexecute_query(sql)— runs arbitrary SQL and returns resultsadd_columns_to_table(table_name, columns)— adds new columns to an existing tablelist_all_tables()— returns all table names in the databaseimport_multiple_excel_files(file_list)— batch import
Adding New Imports
To import a new Excel file into SQLite:
- (Optional) Add the filename to
OTHER_EXCEL_FILESinconfig.pyso it is picked up byimport excelwithout flags. - Call
DatabaseManager.import_from_excel(file_path, table_name=...)from your handler function.
Adding New Subcommands
The CLI is defined in cli.py. To add a new subcommand:
- Add the argparse definition in
main()under the appropriate group (top-level, import, or resolve). - Add a dispatch branch in the
if/elifchain at the bottom ofmain(). - Create a
_cmd_<name>handler function that lazy-imports the target module and calls the appropriate function. - Implement the handler function in the target module (typically
main.pyfor non-interactive wrappers).
The lazy import pattern keeps --help fast and avoids import errors when credentials are missing:
def _cmd_my_command(args: argparse.Namespace) -> None:
from bc_dev_tools.main import my_function
my_function(param=args.param)
Matching Logic
Matching lives in faqbc_data_matcher.py and uses:
COLUMN_MAPPINGfromconfig.py— maps BC columns to FAQ columnsdifflib.SequenceMatcher— computes similarity ratios for each column pair
If you change Excel headers or API response fields, update COLUMN_MAPPING in config.py and keep the documentation in sync.
Exporting
Export logic lives in exporter.py and uses openpyxl via pandas.ExcelWriter. The exported workbook contains three sheets: All Results, Matched, and Unmatched.
Logging
- Avoid logging secrets or tokens.
- Use
HTTP_DEBUG=1only when debugging API connectivity or protocol issues. It enableshttp.clientdebug output andurllib3DEBUG-level logging. - The
python-dotenvpackage is a runtime dependency — it loads.envfiles at startup in__main__.py.
Dependencies
All dependencies are pinned in requirements.txt. Key packages:
| Package | Purpose |
|---|---|
pandas | DataFrame operations, Excel reading/writing |
openpyxl | Excel file format support for pandas |
numpy | Numeric operations, used by pandas and data processing |
requests | HTTP client for API calls |
python-dotenv | .env file loading |
phonenumbers | Phone number parsing (used in data normalization) |