bac_tools/CLAUDE.md

1.4 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Single-script Python tool that extracts credit card transactions from BAC Costa Rica statement PDFs. Parses sections B (purchases), D (other charges), and E (voluntary services) and outputs JSON.

Dependencies

  • pdfplumber (>=0.10.0)

Commands

# Run tests
python testStatements/run_tests.py

# Run extractor
python bac_extract.py <pdf_file> [options]

# Examples
python bac_extract.py EstadodeCuenta.pdf --pretty
python bac_extract.py statement.pdf -o output.json -v

Options:

  • -o, --output: Output JSON path (default: transactions.json)
  • --pretty: Pretty-print JSON
  • -v, --verbose: Enable debug logging

Architecture

The extraction pipeline:

  1. Validates PDF is a BAC statement (is_bac_statement)
  2. Iterates pages line-by-line, detecting section boundaries via SECTIONS dict patterns
  3. Parses transactions matching TRANSACTION_PATTERN regex
  4. Outputs card holders, transactions by section, and summaries

Key data structures:

  • SECTIONS: Maps section IDs (B/D/E) to start/end regex patterns and output keys
  • SPANISH_MONTHS: Spanish month abbreviations for date parsing

Key parsing functions:

  • parse_spanish_date: Converts "15-ENE-25" to "2025-01-15"
  • parse_amount: Handles "1,234.56" and trailing negatives "100.00-"
  • matches_patterns: Generic regex pattern matcher for section detection