update docs
This commit is contained in:
parent
7297abadcd
commit
12e818b82c
2 changed files with 79 additions and 20 deletions
40
CLAUDE.md
40
CLAUDE.md
|
|
@ -4,11 +4,14 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
||||||
|
|
||||||
## Project Overview
|
## Project Overview
|
||||||
|
|
||||||
Single-script Python tool that extracts credit card transactions from BAC Costa Rica statement PDFs. Parses sections B (purchases), D (other charges), and E (voluntary services) and outputs JSON.
|
Python tools for BAC Costa Rica credit card statement processing:
|
||||||
|
- `bac_extract.py`: Extracts transactions from statement PDFs to JSON
|
||||||
|
- `bac_analyze.py`: Analyzes JSON output with categorization and graphs
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
- pdfplumber (>=0.10.0)
|
- pdfplumber (>=0.10.0) - PDF extraction
|
||||||
|
- matplotlib (>=3.5.0) - graphs (optional, only for bac_analyze.py --graph)
|
||||||
|
|
||||||
## Commands
|
## Commands
|
||||||
|
|
||||||
|
|
@ -16,22 +19,22 @@ Single-script Python tool that extracts credit card transactions from BAC Costa
|
||||||
# Run tests
|
# Run tests
|
||||||
python testStatements/run_tests.py
|
python testStatements/run_tests.py
|
||||||
|
|
||||||
# Run extractor
|
# Extract transactions from PDF
|
||||||
python bac_extract.py <pdf_file> [options]
|
python bac_extract.py statement.pdf --pretty
|
||||||
|
|
||||||
# Examples
|
|
||||||
python bac_extract.py EstadodeCuenta.pdf --pretty
|
|
||||||
python bac_extract.py statement.pdf -o output.json -v
|
python bac_extract.py statement.pdf -o output.json -v
|
||||||
```
|
|
||||||
|
|
||||||
Options:
|
# Analyze transactions (supports multiple JSON files)
|
||||||
- `-o, --output`: Output JSON path (default: transactions.json)
|
python bac_analyze.py transactions.json
|
||||||
- `--pretty`: Pretty-print JSON
|
python bac_analyze.py *.json --graph all
|
||||||
- `-v, --verbose`: Enable debug logging
|
python bac_analyze.py *.json --graph bar -o spending.png
|
||||||
|
python bac_analyze.py *.json --categories my_categories.json
|
||||||
|
```
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
The extraction pipeline:
|
### bac_extract.py
|
||||||
|
|
||||||
|
Extraction pipeline:
|
||||||
1. Validates PDF is a BAC statement (`is_bac_statement`)
|
1. Validates PDF is a BAC statement (`is_bac_statement`)
|
||||||
2. Iterates pages line-by-line, detecting section boundaries via `SECTIONS` dict patterns
|
2. Iterates pages line-by-line, detecting section boundaries via `SECTIONS` dict patterns
|
||||||
3. Parses transactions matching `TRANSACTION_PATTERN` regex
|
3. Parses transactions matching `TRANSACTION_PATTERN` regex
|
||||||
|
|
@ -41,7 +44,10 @@ Key data structures:
|
||||||
- `SECTIONS`: Maps section IDs (B/D/E) to start/end regex patterns and output keys
|
- `SECTIONS`: Maps section IDs (B/D/E) to start/end regex patterns and output keys
|
||||||
- `SPANISH_MONTHS`: Spanish month abbreviations for date parsing
|
- `SPANISH_MONTHS`: Spanish month abbreviations for date parsing
|
||||||
|
|
||||||
Key parsing functions:
|
### bac_analyze.py
|
||||||
- `parse_spanish_date`: Converts "15-ENE-25" to "2025-01-15"
|
|
||||||
- `parse_amount`: Handles "1,234.56" and trailing negatives "100.00-"
|
Analysis pipeline:
|
||||||
- `matches_patterns`: Generic regex pattern matcher for section detection
|
1. Loads transactions from one or more JSON files (purchases only)
|
||||||
|
2. Categorizes by matching description against patterns in `categories.json`
|
||||||
|
3. Aggregates by category and month, keeping CRC/USD separate
|
||||||
|
4. Outputs text summary and optional graphs (bar/pie/timeline/all)
|
||||||
|
|
|
||||||
59
README.md
59
README.md
|
|
@ -1,13 +1,16 @@
|
||||||
# BAC Statement Extractor
|
# BAC Statement Tools
|
||||||
|
|
||||||
Extracts credit card transactions from BAC Costa Rica statement PDFs. Parses sections B (purchases), D (other charges), and E (voluntary services) and outputs JSON.
|
Tools for processing BAC Costa Rica credit card statement PDFs.
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
- Python 3.10+
|
- Python 3.10+
|
||||||
- pdfplumber (>=0.10.0)
|
- pdfplumber (>=0.10.0)
|
||||||
|
- matplotlib (>=3.5.0) - optional, for graphs
|
||||||
|
|
||||||
## Usage
|
## Extraction
|
||||||
|
|
||||||
|
Extract transactions from statement PDFs to JSON.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python bac_extract.py <pdf_file> [options]
|
python bac_extract.py <pdf_file> [options]
|
||||||
|
|
@ -24,6 +27,56 @@ python bac_extract.py statement.pdf --pretty
|
||||||
python bac_extract.py statement.pdf -o output.json -v
|
python bac_extract.py statement.pdf -o output.json -v
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Analysis
|
||||||
|
|
||||||
|
Analyze extracted transactions with category breakdowns and graphs.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python bac_analyze.py <json_files...> [options]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
- `--graph {bar,pie,timeline,all}`: Generate graph(s)
|
||||||
|
- `-o, --output`: Output file for graph (default: spending_<type>.png)
|
||||||
|
- `--show`: Display graph interactively
|
||||||
|
- `--categories`: Custom categories file (default: categories.json)
|
||||||
|
|
||||||
|
**Examples:**
|
||||||
|
```bash
|
||||||
|
# Text summary
|
||||||
|
python bac_analyze.py transactions.json
|
||||||
|
|
||||||
|
# Analyze multiple statements
|
||||||
|
python bac_analyze.py *.json
|
||||||
|
|
||||||
|
# Generate all graphs
|
||||||
|
python bac_analyze.py *.json --graph all
|
||||||
|
|
||||||
|
# Generate bar chart with custom output
|
||||||
|
python bac_analyze.py *.json --graph bar -o spending.png
|
||||||
|
|
||||||
|
# Use custom categories
|
||||||
|
python bac_analyze.py *.json --categories my_categories.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Categories
|
||||||
|
|
||||||
|
Create a `categories.json` file to customize spending categories. Each category maps to a list of merchant name patterns (case-insensitive substring match).
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"Groceries": ["SUPERMARKET", "WALMART", "FRESH MARKET"],
|
||||||
|
"Gas": ["SERVICENTRO", "DELTA", "SHELL"],
|
||||||
|
"Restaurants": ["RESTAURANT", "CAFE", "PIZZA", "SUSHI"],
|
||||||
|
"Transportation": ["UBER", "TAXI", "PARKING"],
|
||||||
|
"Entertainment": ["CINEMA", "NETFLIX", "STEAM"],
|
||||||
|
"Utilities": ["ELECTRIC", "WATER", "INTERNET"],
|
||||||
|
"Subscriptions": ["SPOTIFY", "YOUTUBE", "CHATGPT"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Transactions not matching any pattern are categorized as "Other".
|
||||||
|
|
||||||
## Output Format
|
## Output Format
|
||||||
|
|
||||||
```json
|
```json
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue