Documents¶
Manage your project documents via cloud storage synchronization and email integration. This guide covers everything about managing your project documents.
What Are Documents?¶
Documents in SiteWhiz are your project's knowledge base:
- Specifications - Technical requirements and standards
- Contracts - Agreements and legal documents
- Drawings - Floor plans and technical drawings
- Reports - Inspection reports and progress updates
- Photos - Site photos and documentation
- Emails - Project communication and decisions
AI Assistant
To ask questions about your documents or create items via chat, see the AI Assistant Guide →.
Supported File Formats¶
SiteWhiz supports the following file formats for document processing:
| Format | Extensions | Processing |
|---|---|---|
| Full text extraction + OCR | ||
| Word | .doc, .docx | Full text extraction |
| Excel | .xls, .xlsx | Table and data extraction |
| Text | .txt | Direct text extraction |
File size limit: - Maximum per file: 50 MB - Larger files may be rejected or process slowly
Other file types
Other file types (such as PowerPoint, images, archives) are not processed as documents. Only PDF, Word, Excel, and text files are analyzed and made searchable.
Multiple formats of the same document
If you have the same document in multiple formats (for example, both PDF and Excel or Word), add +-+ to the PDF filename to ignore it. The AI understands Excel and Word documents better than PDFs, especially for complex documents like quantity surveys. By ignoring the PDF and using only the Excel/Word version, the chatbot makes fewer errors.
Document Classifications¶
SiteWhiz automatically classifies your documents into different types. This classification helps the AI understand documents better and extract the right information.
Available Classifications¶
| Classification | Description | Examples |
|---|---|---|
| price_breakdown | Detailed price lists | Quantity surveys, detailed quotes, progress reports |
| financial | Financial documents | Invoices, quotes, price estimates, cost estimates |
| contract | Contractual documents | Contracts, agreements, construction contracts |
| technical | Technical documentation | Calculations, specifications, standards, specifications |
| drawing | Drawings and plans | Construction drawings, floor plans, installation diagrams |
| survey | Surveys and investigations | Land surveys, soil investigations, soil reports |
| permit | Permits and certificates | Permits, approvals, inspections |
| safety | Safety documents | Risk analyses, safety plans, procedures |
| report | Reports | Site reports, minutes, progress reports |
| correspondence | Correspondence | Emails, letters, communication |
| administrative | Administrative documents | Forms, registrations |
How classification works: - SiteWhiz automatically analyzes the content of each document - The AI determines the most likely type based on text, structure, and context - Classification happens during processing - You can filter documents by type on the Documents page
Cloud Storage Integration¶
Automatically sync documents from Dropbox or OneDrive.
Supported providers: - Dropbox (personal and business) - OneDrive (personal and business)
What syncs: - New files uploaded to selected folder - Modified files (updates in SiteWhiz) - Folder structure preserved
What doesn't sync: - Deleted files (remain in SiteWhiz) - Files outside selected folder - SiteWhiz → Cloud (one-way sync from cloud to SiteWhiz)
Detailed Instructions
For complete step-by-step instructions on cloud storage integration, including connecting, configuring and managing, see the Projects Guide →.
Email Integration¶
Process documents received via email automatically.
Per-User Integration
Email integration is configured per user in your user settings. Each team member connects their own Outlook account.
How it works:
- Connect your Outlook account (Microsoft 365) in your user settings
- Configure which Outlook folder to watch for each project
- Sort emails into that folder in Outlook
- AI automatically processes email content and attachments
Setting Up Email Integration
Connecting Outlook (User Settings)¶
- Go to Settings → Integrations (your user settings, not project settings)
- Click Connect Outlook
- Login to Microsoft account
- Authorize SiteWhiz access
- Connection confirmed
Per User
Each team member must connect their own Outlook account. Email integration is user-specific, not project-wide.
Configuring Project Folder (Per Project)¶
- Go to Project Settings → Email Configuration
- Enter Outlook folder path for this specific project (e.g.,
/Archive/Projects/Building A) - Click Save
Folder path examples:
How the system works: - You connect your Outlook account once in user settings (Settings → Integrations) - For each project, you configure the FOLDER PATH in project settings - SiteWhiz automatically watches emails arriving in that folder - All emails in configured folder (and subfolders) are processed - Emails in other folders are ignored
Using Email Integration¶
Step 1: Receive Email
- Get email with site report, quote, or specification
- Email may include PDF attachments
Step 2: Sort in Outlook
- Move email to the configured Outlook folder
- Drag from Inbox to project folder, or use "Move to folder"
- You do this in Outlook - NOT in SiteWhiz
Step 3: Automatic Processing
- SiteWhiz detects the new email automatically (within a few minutes)
- AI reads email content
- Analyzes attachments (PDFs, images, documents)
- Extracts action items and decisions
- Creates suggested items in AI Extraction section
Your time savings: From 10 minutes of typing to 5 seconds of dragging.
Email Thread Processing¶
Full context understanding:
- The AI reads entire email threads
- Understands conversation flow
- Extracts final decisions
- Creates comprehensive items
Instead of 5 separate items, you get one item with full history.
Review Before Confirmation¶
Emails appear as AI suggestions first:
- Go to AI Extraction in your project
- Review suggested items
- Edit if needed
- Check relevant items
- Click Confirm
Privacy and Security¶
- Only emails you move to project folder are processed
- AI has no access to your entire inbox
- Processed on secure servers
- No data shared with third parties
Ignoring Documents¶
Sometimes you want SiteWhiz to not process certain documents. This can be useful for:
- Temporary files
- Test documents
- Old versions you want to keep but not process
How to use:
- Add
+-+to the filename in your cloud storage - For example:
report.pdf→report +-+.pdf - SiteWhiz will automatically ignore this document during synchronization
- The document will be marked as deleted and embeddings will be cleaned up
Examples:
- quote_v1 +-+.pdf - Old version you don't want processed
- test +-+ document.pdf - Test file
- temporary +-+ file.docx - Temporary document
Renaming files
You can rename the file in Dropbox or OneDrive to add the +-+ pattern. SiteWhiz detects this automatically during the next synchronization.
Processing Status¶
Each document shows its processing status:
| Status | Meaning |
|---|---|
| Processing | Document is currently being processed by the AI |
| Ready | Document is fully processed and searchable |
| Error | Processing failed (try again or contact support) |
Processing time: - Small documents (< 10 pages): 30 seconds - 2 minutes - Large documents (100+ pages): 5-10 minutes - Depends on document type and complexity
Document Processing Pipeline¶
Understanding how SiteWhiz processes your documents.
During the project: - Reports (site reports, architect reports) → Added to your task list as action items - Manuals, plans → Make your chatbot smarter (searchable knowledge base) - Financial documents → Analyzed for budget management
5-Stage Pipeline:
Processing Stages Explained
Stage 1: Receipt¶
- File detected from cloud sync or email integration
- File received and stored securely
- Basic validation checks file type and size
- Document appears in list as "Processing"
Stage 2: Text Extraction¶
For text-based documents: - PDF text extracted directly - Word documents parsed - Excel data converted to searchable text
Best results with: - Text-based PDFs (not scanned images) - Standard fonts - Properly formatted documents
Stage 3: OCR (If Needed)¶
For image-based content: - Scanned pages analyzed - Text recognized using OCR - Tables and figures identified
Best quality: - 300 DPI minimum resolution - High contrast - Clean scans (no handwriting over text)
Stage 4: AI Processing¶
- Content analyzed for meaning
- Semantic embeddings created
- Related concepts linked
- Context understood
Stage 5: Indexing¶
- Full-text search index built
- Document marked as "Ready"
- Content available for AI queries
- Searchable across all fields
Quality Factors¶
Best results with: - Text-based documents (not scanned) - High resolution (300 DPI+ for scans) - Properly oriented (not rotated) - Clean formatting - Standard fonts
Challenging documents: - Low-quality scans - Handwritten content - Complex tables - Multi-column layouts - Unusual fonts
Supported Languages¶
| Language | Support Level |
|---|---|
| English | Full |
| Dutch | Full |
Best Practices¶
Organization:
- Use clear, descriptive filenames
- Remove outdated versions or mark them with +-+
- Group related documents in folders
- Keep documents current
Quality: - Use text-based PDFs when possible (not scanned images) - Ensure scans are high resolution (300 DPI+) - Use complete documents (not excerpts)
Access: - Use cloud sync for automatic updates
Troubleshooting¶
For detailed troubleshooting for documents, including processing, cloud sync and more, see the Troubleshooting Guide →.
Next Steps¶
- AI Assistant → - Ask questions and create items via chat
- Creating Items → - Other ways to create items
- Projects Guide → - Organize your projects
- Mobile App → - Use documents on-site