Document Processing¶
How SiteWhiz AI processes and indexes your documents.
Overview¶
When you upload documents, SiteWhiz automatically:
- Extracts text - Reads all content from the document
- Performs OCR - Converts images and scans to text
- Creates embeddings - Prepares content for AI search
- Indexes content - Makes everything searchable
Processing Pipeline¶
Stage 1: Upload¶
- File is received and stored securely
- Basic validation checks file type and size
- Document appears in list as "Processing"
Stage 2: Text Extraction¶
For text-based documents:
- PDF text is extracted directly
- Word documents are parsed
- Excel data is converted to searchable text
Stage 3: OCR (If Needed)¶
For image-based content:
- Scanned pages are analyzed
- Text is recognized using OCR
- Tables and figures are identified
Stage 4: AI Processing¶
- Content is analyzed for meaning
- Semantic embeddings are created
- Related concepts are linked
Stage 5: Indexing¶
- Full-text search index is built
- Document is marked as "Ready"
- Content is available for AI queries
Processing Time¶
| Document Type | Typical Time |
|---|---|
| Text PDF (10 pages) | 30 seconds |
| Scanned PDF (10 pages) | 2-3 minutes |
| Large document (100+ pages) | 5-10 minutes |
| Excel spreadsheet | 1-2 minutes |
Processing Queue
Multiple documents process in parallel. Large batches may take longer during peak times.
Document Status¶
Checking Status¶
In the Documents list, each file shows its status:
| Status | Icon | Meaning |
|---|---|---|
| Uploading | File transfer in progress | |
| Processing | AI extraction in progress | |
| Ready | Fully searchable | |
| Error | Processing failed |
Status Details¶
Click on a document to see detailed status:
- Processing stage
- Time elapsed
- Any warnings or errors
Quality Factors¶
Best Results¶
Documents process best when they are:
- Text-based (not scanned images)
- High resolution (300 DPI minimum for scans)
- Properly oriented (not rotated)
- Clean (no handwriting over text)
- Standard fonts
Challenging Documents¶
These may have reduced accuracy:
- Low-quality scans
- Handwritten content
- Complex tables
- Multi-column layouts
- Unusual fonts or languages
Reprocessing Documents¶
When to Reprocess¶
Consider reprocessing if:
- Original had poor quality
- Better version is available
- Processing failed
How to Reprocess¶
-
Delete the document from SiteWhiz.
-
Upload the document again.
-
Wait for processing to complete.
Supported Languages¶
Text extraction supports:
| Language | Support Level |
|---|---|
| English | Full |
| Dutch | Full |
| German | Full |
| French | Full |
| Spanish | Full |
| Other Western | Good |
Data Security¶
During Processing¶
- Documents are encrypted in transit
- Processing happens on secure servers
- No data is shared with third parties
After Processing¶
- Original files are stored securely
- Extracted text is encrypted
- Access is controlled by permissions
Troubleshooting¶
Processing Stuck¶
If a document stays in "Processing" too long:
- Wait at least 15 minutes for large files
- Refresh the page to check status
- Delete and re-upload if still stuck
- Contact support for persistent issues
Poor Text Quality¶
If extracted text has errors:
- Check original document quality
- Ensure scan resolution is adequate
- Try a cleaner copy of the document
- Some formatting may not convert well
Processing Failed¶
If processing fails:
- Check file isn't corrupted
- Verify format is supported
- Try converting to PDF first
- Contact support with error details
Related Topics¶
- Uploading Documents - Add documents
- Viewing Documents - Use the viewer
- Cloud Storage - Automatic sync
- AI Assistant - Search processed documents