# Aumentum Data Structures & Database Schema

## Overview

Aumentum is a property tax and land registry management system developed by Thomson Reuters (now owned by Harris Computer Corporation). This document outlines what we know about its data storage and how to access it.

## Current Understanding

Based on the conversion scripts (`convert_script_db.py`, `convert_bin.py`), here's what we've discovered:

### 1. Document Storage Structure

**File System (Contentstore):**
```
contentstore/
  └── <YEAR>/
        └── <MONTH>/
              └── <DAY>/
                    └── <HOUR>/
                          └── <MIN>/
                                └── <UUID>.bin
```

**Example:**
- `contentstore/2014/11/03/15/42/0c21fb5b-8fec-433b-b5f1-e79d1bbe12e3.bin`

**Characteristics:**
- Files use UUID/GUID naming conventions
- Temporal organization by year/month/day
- `.bin` extension but contain JPEG data (LEADTOOLS format)
- Magic bytes: `FF D8 FF` (JPEG signatures)

### 2. Database Schema

**Database:** MySQL on host `10.10.10.3`

**Known Table: `doc_storage`**

| Column | Type | Description |
|--------|------|-------------|
| `content_path` | varchar/text | Relative path to file (e.g., "2014/11/03/file.bin") |

**Query Pattern (supports deeper filtering):**
```sql
SELECT content_path
FROM doc_storage
WHERE content_path LIKE '%/2014/%'           -- year
  AND content_path LIKE '%/2014/11/%'        -- month
  AND content_path LIKE '%/2014/11/03/%'     -- day
  AND content_path LIKE '%/2014/11/03/15/%'  -- hour
  AND content_path LIKE '%/2014/11/03/15/42/%' -- minute
```

### 3. Missing Schema Information

We need to discover:
- **Property/Parcel Tables**: Likely named `property`, `parcel`, `lot`, `assessment`
- **Owner Tables**: `owner`, `owner_name`, `taxpayer`
- **Document-Property Links**: Junction table linking `doc_storage` to properties
- **Relationship Fields**: Foreign keys and indexes

## Conversion Process

### LEADTOOLS .bin to PDF

The `.bin` files contain JPEG images wrapped in LEADTOOLS proprietary format. Conversion process:

```python
# 1. Read as JPEG (PIL recognizes LEADTOOLS JPEG)
img = Image.open(bin_file)

# 2. Convert to RGB
rgb = img.convert("RGB")

# 3. Save as PDF
rgb.save(pdf_path, "PDF", resolution=100.0)
```

**Key Point**: Pillow (PIL) can read LEADTOOLS `.bin` files directly as JPEG, even though they have `.bin` extension.

## How Documents Relate to Properties

**Current Knowledge:**
- Documents are stored in `contentstore/` with date-based paths
- Document paths are referenced in `doc_storage.content_path`
- **Missing**: How documents link to properties/parcels

**Likely Structure** (to be confirmed):
```
property_table
  ├── property_id
  ├── apn
  ├── owner_id
  └── ...

doc_storage
  ├── doc_id
  ├── content_path
  ├── property_id (likely foreign key)
  └── ...

document_property_link (junction table - if exists)
  ├── doc_id
  ├── property_id
  └── document_type
```

## Fetching Properties from Third-Party Applications

### Current Approach

1. **Direct Database Access** (via `aumentum_browser_service.py`):
   - Connect to MySQL at `10.10.10.3`
   - Query `doc_storage` table
   - Resolve file paths
   - Convert `.bin` to PDF on-demand

2. **REST API** (via `aumentum_api.py`):
   - FastAPI server exposing endpoints
   - Browser extension can call API
   - PDFs served as HTTP responses

### Browser Extension Integration

The browser extension (`browser_extension/`) enables:
- Lookup by document path
- Lookup by APN (once property joins are discovered)
- Direct PDF viewing in browser tabs
- No need to install Aumentum desktop client

## Next Steps to Complete Integration

### 1. Database Schema Discovery

```bash
# Connect to database
mysql -h 10.10.10.3 -u sa -p aumentum

# List all tables
SHOW TABLES;

# Look for property-related tables
SHOW TABLES LIKE '%property%';
SHOW TABLES LIKE '%parcel%';
SHOW TABLES LIKE '%owner%';

# Describe key tables
DESCRIBE property;  # (if exists)
DESCRIBE parcel;    # (if exists)
DESCRIBE doc_storage;

# Find relationships
SELECT * FROM information_schema.KEY_COLUMN_USAGE 
WHERE TABLE_SCHEMA = 'aumentum' 
AND REFERENCED_TABLE_NAME = 'property';
```

### 2. Query Examples to Try

```sql
-- Look for property-document relationships
SELECT * FROM doc_storage LIMIT 10;

-- Search for columns that might link to properties
SELECT COLUMN_NAME, DATA_TYPE 
FROM information_schema.COLUMNS 
WHERE TABLE_SCHEMA = 'aumentum' 
AND COLUMN_NAME LIKE '%property%' 
OR COLUMN_NAME LIKE '%parcel%' 
OR COLUMN_NAME LIKE '%apn%';

-- Look for document types or categories
SELECT DISTINCT doc_type FROM doc_storage;  -- (if exists)
```

### 3. Update Service with Discovered Schema

Once schema is discovered, update `aumentum_browser_service.py`:

```python
def get_documents_by_property(self, property_id=None, apn=None):
    conn = mysql.connector.connect(**self.db_config)
    cursor = conn.cursor()
    
    # Updated query with proper joins
    query = """
        SELECT ds.content_path
        FROM doc_storage ds
        JOIN property_documents pd ON ds.doc_id = pd.doc_id
        JOIN property p ON pd.property_id = p.property_id
        WHERE 1=1
    """
    
    if apn:
        query += " AND p.apn = %s"
        cursor.execute(query, (apn,))
    elif property_id:
        query += " AND p.property_id = %s"
        cursor.execute(query, (property_id,))
    
    results = [r[0] for r in cursor.fetchall()]
    return results
```

## Alternative Access Methods

### 1. Aumentum Web Interface

If Aumentum has a web interface:
- Inspect network requests to understand API
- Extract session tokens
- Parse HTML to extract property IDs

### 2. Direct File System Access

If you have server access:
- Mount contentstore via NFS/SMB
- Parse directory structure
- Build document index

### 3. Aumentum API (if available)

Some versions of Aumentum have REST APIs:
- Check documentation
- Look for API endpoints in web interface
- May require authentication tokens

## Security Considerations

- **Read-only database user**: Create dedicated user with SELECT only
- **Path validation**: Sanitize file paths to prevent directory traversal
- **Rate limiting**: Add rate limiting to API endpoints
- **Authentication**: Implement authentication for production use
- **Network security**: Use HTTPS in production, restrict API access

## References

- `convert_script_db.py`: Database query patterns
- `convert_bin.py`: File conversion logic
- `aumentum_browser_service.py`: Unified service layer
- `aumentum_api.py`: REST API endpoints
- `README_BROWSER_ACCESS.md`: Setup and usage guide

## Example Usage

```python
from aumentum_browser_service import AumentumBrowserService

service = AumentumBrowserService(
    db_config={"host": "10.10.10.3", "user": "sa", "database": "aumentum"},
    contentstore_base="/mnt/contentstore"
)

# Discover schema
schema = service.discover_database_schema()
print(schema['property_tables'])

# Lookup document
result = service.lookup_document("2014/11/03/file.bin", convert_to_pdf=True)
print(f"PDF created: {result['pdf_path']}")

# Get property documents (once schema discovered)
docs = service.get_documents_by_property(apn="123-456-789")
for doc_path in docs:
    print(doc_path)
```

---

**Last Updated**: Based on analysis of existing conversion scripts
**Next Action**: Discover full database schema to enable property-based lookups

