# 🔍 Finding ALL Wrong Associations (If Needed)

## 🎯 **When to Use This**

Use this if:
- BP102 still shows wrong content after the fix
- You discover other documents with wrong associations
- You need to identify the full scope of the 2015 scanning error

---

## 📊 **Pattern Recognition**

From what we've discovered, documents mislabeled in 2015 have these characteristics:

### **Common Traits:**
- ✅ All created in 2015 (March-July)
- ✅ Sequential mislabeling (each gets next document's file)
- ✅ Affects multiple document types per document number
- ✅ Created by users: alamba, uofem

### **Document Numbers Affected So Far:**
- PL11089 (created 2015-03-09)
- PL689 (created 2015-03-09)
- BP102 (created 2015-04-28)
- PL6204 (created 2015-07-10)
- PL12321 (created 2015-??-??)

---

## 🧪 **How to Find More Wrong Associations**

### **Step 1: Get All Documents from 2015 Scanning Period**

```bash
# Get documents created in early 2015
curl "http://localhost:8001/lrs/source-documents?limit=200" | \
  jq -r '.items[] | select(.create_date? | startswith("2015")) | "\(.document_number): Created \(.create_date)"' | \
  sort | uniq
```

This will list all documents from 2015.

### **Step 2: Sample Test Multiple Documents**

Test a sample of documents to see if they show correct content:

```bash
# Create a test function
test_sample() {
    local doc=$1
    local doc_id=$(curl -s "http://localhost:8001/documents/by-document-number?document_number=$doc" | jq -r '.items[0].id // empty')
    
    if [ ! -z "$doc_id" ]; then
        echo "Testing $doc (ID: $doc_id)..."
        curl -s "http://localhost:8001/documents/pdf-by-document-number?document_number=$doc&document_id=$doc_id" \
          -o "/tmp/sample_${doc}.pdf"
        
        if file "/tmp/sample_${doc}.pdf" | grep -q "PDF"; then
            echo "  ✅ PDF generated: /tmp/sample_${doc}.pdf"
        else
            echo "  ❌ Failed"
        fi
    fi
}

# Test a sample
for doc in PL100 PL200 PL500 PL1000 BP100 BP200 BP300; do
    test_sample "$doc"
done
```

### **Step 3: Manual Verification of Samples**

Open each sample PDF and check if the document number matches:

```bash
xdg-open /tmp/sample_PL*.pdf
xdg-open /tmp/sample_BP*.pdf
```

For each one:
- Does the PDF show the correct document number?
- Or does it show a different document?

---

## 📋 **Building Complete Mapping**

If you find more wrong associations:

### **Create Mapping Table:**

| Queried | Shows | Store URL Needed |
|---------|-------|-----------------|
| PL689 | PL689 ✅ | `store://2015/3/26/.../3eee6f3f...fed.bin` |
| BP102 | BP102 ? | `store://2015/3/17/.../879dcd53...275.bin` |
| PL6204 | PL6204 ✅ | `store://2015/4/28/.../df4050c2...878b.bin` |
| PL12321 | PL12321 ✅ | `store://2015/7/10/.../a57f38d9...4d13.bin` |
| PL11089 | PL689 ❌ | ??? (not found) |
| [New Doc] | [Wrong] | ??? (to be found) |

### **Add to CORRECT_FILE_MAPPING:**

```python
CORRECT_FILE_MAPPING = {
    'PL689': { ... },   # Existing
    'BP102': { ... },   # Existing
    'PL6204': { ... },  # Existing
    'PL12321': { ... }, # Existing
    
    # Add new discoveries:
    'NEW_DOC': {
        'correct_url': 'store://....',
        'wrong_label': 'WHAT_ITS_LABELED_AS',
        'reason': 'Description of the issue'
    },
}
```

---

## 🚨 **If Many Documents Are Wrong**

If you discover 10+ documents with wrong associations, we should:

### **Option A: Systematic Database Correction**
- Export complete node→label mapping
- Create SQL UPDATE script
- Fix database permanently
- Timeline: 2-3 days with testing

### **Option B: Comprehensive Python Mapping**
- Create complete CORRECT_FILE_MAPPING for all affected documents
- Keep code-based workaround
- Easier to test and revert
- Timeline: 1-2 days

---

## 💡 **Quick Scope Assessment**

```bash
# How many documents were created in March-July 2015?
curl "http://localhost:8001/lrs/source-documents?limit=500" | \
  jq '[.items[] | select(.create_date? | startswith("2015"))] | length'

# This tells us the potential scope
```

If result is:
- < 50 documents → Manageable with Python mapping
- 50-200 documents → Consider database correction
- > 200 documents → Definitely need database fix

---

## 📞 **Next Steps**

1. **Verify BP102**: Open `/tmp/CHECK_BP102.pdf` and check content
2. **Report findings**: What does BP102 show?
3. **Identify scope**: Are there many more affected documents?
4. **Choose strategy**: Python mapping vs database correction

---

**Quick Verification:**
```bash
xdg-open /tmp/CHECK_BP102.pdf
```

What do you see? Report back! 🔍

