diff --git a/CLAUDE.md b/CLAUDE.md index 5b058a9..e4f77b3 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,11 +4,14 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## Project Overview -Single-script Python tool that extracts credit card transactions from BAC Costa Rica statement PDFs. Parses sections B (purchases), D (other charges), and E (voluntary services) and outputs JSON. +Python tools for BAC Costa Rica credit card statement processing: +- `bac_extract.py`: Extracts transactions from statement PDFs to JSON +- `bac_analyze.py`: Analyzes JSON output with categorization and graphs ## Dependencies -- pdfplumber (>=0.10.0) +- pdfplumber (>=0.10.0) - PDF extraction +- matplotlib (>=3.5.0) - graphs (optional, only for bac_analyze.py --graph) ## Commands @@ -16,22 +19,22 @@ Single-script Python tool that extracts credit card transactions from BAC Costa # Run tests python testStatements/run_tests.py -# Run extractor -python bac_extract.py [options] - -# Examples -python bac_extract.py EstadodeCuenta.pdf --pretty +# Extract transactions from PDF +python bac_extract.py statement.pdf --pretty python bac_extract.py statement.pdf -o output.json -v -``` -Options: -- `-o, --output`: Output JSON path (default: transactions.json) -- `--pretty`: Pretty-print JSON -- `-v, --verbose`: Enable debug logging +# Analyze transactions (supports multiple JSON files) +python bac_analyze.py transactions.json +python bac_analyze.py *.json --graph all +python bac_analyze.py *.json --graph bar -o spending.png +python bac_analyze.py *.json --categories my_categories.json +``` ## Architecture -The extraction pipeline: +### bac_extract.py + +Extraction pipeline: 1. Validates PDF is a BAC statement (`is_bac_statement`) 2. Iterates pages line-by-line, detecting section boundaries via `SECTIONS` dict patterns 3. Parses transactions matching `TRANSACTION_PATTERN` regex @@ -41,7 +44,10 @@ Key data structures: - `SECTIONS`: Maps section IDs (B/D/E) to start/end regex patterns and output keys - `SPANISH_MONTHS`: Spanish month abbreviations for date parsing -Key parsing functions: -- `parse_spanish_date`: Converts "15-ENE-25" to "2025-01-15" -- `parse_amount`: Handles "1,234.56" and trailing negatives "100.00-" -- `matches_patterns`: Generic regex pattern matcher for section detection +### bac_analyze.py + +Analysis pipeline: +1. Loads transactions from one or more JSON files (purchases only) +2. Categorizes by matching description against patterns in `categories.json` +3. Aggregates by category and month, keeping CRC/USD separate +4. Outputs text summary and optional graphs (bar/pie/timeline/all) diff --git a/README.md b/README.md index a56332a..709c00d 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,16 @@ -# BAC Statement Extractor +# BAC Statement Tools -Extracts credit card transactions from BAC Costa Rica statement PDFs. Parses sections B (purchases), D (other charges), and E (voluntary services) and outputs JSON. +Tools for processing BAC Costa Rica credit card statement PDFs. ## Dependencies - Python 3.10+ - pdfplumber (>=0.10.0) +- matplotlib (>=3.5.0) - optional, for graphs -## Usage +## Extraction + +Extract transactions from statement PDFs to JSON. ```bash python bac_extract.py [options] @@ -24,6 +27,56 @@ python bac_extract.py statement.pdf --pretty python bac_extract.py statement.pdf -o output.json -v ``` +## Analysis + +Analyze extracted transactions with category breakdowns and graphs. + +```bash +python bac_analyze.py [options] +``` + +**Options:** +- `--graph {bar,pie,timeline,all}`: Generate graph(s) +- `-o, --output`: Output file for graph (default: spending_.png) +- `--show`: Display graph interactively +- `--categories`: Custom categories file (default: categories.json) + +**Examples:** +```bash +# Text summary +python bac_analyze.py transactions.json + +# Analyze multiple statements +python bac_analyze.py *.json + +# Generate all graphs +python bac_analyze.py *.json --graph all + +# Generate bar chart with custom output +python bac_analyze.py *.json --graph bar -o spending.png + +# Use custom categories +python bac_analyze.py *.json --categories my_categories.json +``` + +## Categories + +Create a `categories.json` file to customize spending categories. Each category maps to a list of merchant name patterns (case-insensitive substring match). + +```json +{ + "Groceries": ["SUPERMARKET", "WALMART", "FRESH MARKET"], + "Gas": ["SERVICENTRO", "DELTA", "SHELL"], + "Restaurants": ["RESTAURANT", "CAFE", "PIZZA", "SUSHI"], + "Transportation": ["UBER", "TAXI", "PARKING"], + "Entertainment": ["CINEMA", "NETFLIX", "STEAM"], + "Utilities": ["ELECTRIC", "WATER", "INTERNET"], + "Subscriptions": ["SPOTIFY", "YOUTUBE", "CHATGPT"] +} +``` + +Transactions not matching any pattern are categorized as "Other". + ## Output Format ```json