47 lines
1.4 KiB
Markdown
47 lines
1.4 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
Single-script Python tool that extracts credit card transactions from BAC Costa Rica statement PDFs. Parses sections B (purchases), D (other charges), and E (voluntary services) and outputs JSON.
|
|
|
|
## Dependencies
|
|
|
|
- pdfplumber (>=0.10.0)
|
|
|
|
## Commands
|
|
|
|
```bash
|
|
# Run tests
|
|
python testStatements/run_tests.py
|
|
|
|
# Run extractor
|
|
python bac_extract.py <pdf_file> [options]
|
|
|
|
# Examples
|
|
python bac_extract.py EstadodeCuenta.pdf --pretty
|
|
python bac_extract.py statement.pdf -o output.json -v
|
|
```
|
|
|
|
Options:
|
|
- `-o, --output`: Output JSON path (default: transactions.json)
|
|
- `--pretty`: Pretty-print JSON
|
|
- `-v, --verbose`: Enable debug logging
|
|
|
|
## Architecture
|
|
|
|
The extraction pipeline:
|
|
1. Validates PDF is a BAC statement (`is_bac_statement`)
|
|
2. Iterates pages line-by-line, detecting section boundaries via `SECTIONS` dict patterns
|
|
3. Parses transactions matching `TRANSACTION_PATTERN` regex
|
|
4. Outputs card holders, transactions by section, and summaries
|
|
|
|
Key data structures:
|
|
- `SECTIONS`: Maps section IDs (B/D/E) to start/end regex patterns and output keys
|
|
- `SPANISH_MONTHS`: Spanish month abbreviations for date parsing
|
|
|
|
Key parsing functions:
|
|
- `parse_spanish_date`: Converts "15-ENE-25" to "2025-01-15"
|
|
- `parse_amount`: Handles "1,234.56" and trailing negatives "100.00-"
|
|
- `matches_patterns`: Generic regex pattern matcher for section detection
|