bac_tools/CLAUDE.md
2026-03-09 13:24:41 -06:00

1.3 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Single-script Python tool that extracts credit card transactions from BAC Costa Rica statement PDFs. Parses section "B) Detalle de compras del periodo" and outputs JSON.

Dependencies

  • pdfplumber (>=0.10.0)

Usage

python bac_extract.py <pdf_file> <card_suffix> [options]

# Examples
python bac_extract.py EstadodeCuenta.pdf 1234 --pretty
python bac_extract.py statement.pdf 1234 -o output.json -v

Options:

  • -o, --output: Output JSON path (default: transactions.json)
  • --pretty: Pretty-print JSON
  • -v, --verbose: Enable debug logging

Architecture

The extraction pipeline:

  1. Validates PDF is a BAC statement (is_bac_statement)
  2. Locates section B via regex patterns (find_section_b_start, is_section_end)
  3. Extracts tables page-by-page using pdfplumber
  4. Filters transactions by card suffix (last 4 digits)
  5. Parses Spanish dates (D-MMM-YY format) and amounts with comma separators

Key parsing functions:

  • parse_spanish_date: Converts "15-ENE-25" to "2025-01-15"
  • parse_amount: Handles "1,234.56" and trailing negatives "100.00-"
  • extract_card_holder: Matches "************1234 NAME" pattern