1.4 KiB
1.4 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Single-script Python tool that extracts credit card transactions from BAC Costa Rica statement PDFs. Parses sections B (purchases), D (other charges), and E (voluntary services) and outputs JSON.
Dependencies
- pdfplumber (>=0.10.0)
Commands
# Run tests
python testStatements/run_tests.py
# Run extractor
python bac_extract.py <pdf_file> [options]
# Examples
python bac_extract.py EstadodeCuenta.pdf --pretty
python bac_extract.py statement.pdf -o output.json -v
Options:
-o, --output: Output JSON path (default: transactions.json)--pretty: Pretty-print JSON-v, --verbose: Enable debug logging
Architecture
The extraction pipeline:
- Validates PDF is a BAC statement (
is_bac_statement) - Iterates pages line-by-line, detecting section boundaries via
SECTIONSdict patterns - Parses transactions matching
TRANSACTION_PATTERNregex - Outputs card holders, transactions by section, and summaries
Key data structures:
SECTIONS: Maps section IDs (B/D/E) to start/end regex patterns and output keysSPANISH_MONTHS: Spanish month abbreviations for date parsing
Key parsing functions:
parse_spanish_date: Converts "15-ENE-25" to "2025-01-15"parse_amount: Handles "1,234.56" and trailing negatives "100.00-"matches_patterns: Generic regex pattern matcher for section detection