update docs

This commit is contained in:
Fabian Montero 2026-03-09 17:04:24 -06:00
parent 7297abadcd
commit 12e818b82c
Signed by: fabian
GPG key ID: 8036F30EDBAC8447
2 changed files with 79 additions and 20 deletions

View file

@ -4,11 +4,14 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Project Overview
Single-script Python tool that extracts credit card transactions from BAC Costa Rica statement PDFs. Parses sections B (purchases), D (other charges), and E (voluntary services) and outputs JSON.
Python tools for BAC Costa Rica credit card statement processing:
- `bac_extract.py`: Extracts transactions from statement PDFs to JSON
- `bac_analyze.py`: Analyzes JSON output with categorization and graphs
## Dependencies
- pdfplumber (>=0.10.0)
- pdfplumber (>=0.10.0) - PDF extraction
- matplotlib (>=3.5.0) - graphs (optional, only for bac_analyze.py --graph)
## Commands
@ -16,22 +19,22 @@ Single-script Python tool that extracts credit card transactions from BAC Costa
# Run tests
python testStatements/run_tests.py
# Run extractor
python bac_extract.py <pdf_file> [options]
# Examples
python bac_extract.py EstadodeCuenta.pdf --pretty
# Extract transactions from PDF
python bac_extract.py statement.pdf --pretty
python bac_extract.py statement.pdf -o output.json -v
```
Options:
- `-o, --output`: Output JSON path (default: transactions.json)
- `--pretty`: Pretty-print JSON
- `-v, --verbose`: Enable debug logging
# Analyze transactions (supports multiple JSON files)
python bac_analyze.py transactions.json
python bac_analyze.py *.json --graph all
python bac_analyze.py *.json --graph bar -o spending.png
python bac_analyze.py *.json --categories my_categories.json
```
## Architecture
The extraction pipeline:
### bac_extract.py
Extraction pipeline:
1. Validates PDF is a BAC statement (`is_bac_statement`)
2. Iterates pages line-by-line, detecting section boundaries via `SECTIONS` dict patterns
3. Parses transactions matching `TRANSACTION_PATTERN` regex
@ -41,7 +44,10 @@ Key data structures:
- `SECTIONS`: Maps section IDs (B/D/E) to start/end regex patterns and output keys
- `SPANISH_MONTHS`: Spanish month abbreviations for date parsing
Key parsing functions:
- `parse_spanish_date`: Converts "15-ENE-25" to "2025-01-15"
- `parse_amount`: Handles "1,234.56" and trailing negatives "100.00-"
- `matches_patterns`: Generic regex pattern matcher for section detection
### bac_analyze.py
Analysis pipeline:
1. Loads transactions from one or more JSON files (purchases only)
2. Categorizes by matching description against patterns in `categories.json`
3. Aggregates by category and month, keeping CRC/USD separate
4. Outputs text summary and optional graphs (bar/pie/timeline/all)