Skip to content

Documents

Manage your project documents via cloud storage synchronization and email integration. This guide covers everything about managing your project documents.


What Are Documents?

Documents in SiteWhiz are your project's knowledge base:

  • Specifications - Technical requirements and standards
  • Contracts - Agreements and legal documents
  • Drawings - Floor plans and technical drawings
  • Reports - Inspection reports and progress updates
  • Photos - Site photos and documentation
  • Emails - Project communication and decisions

AI Assistant

To ask questions about your documents or create items via chat, see the AI Assistant Guide →.


Supported File Formats

SiteWhiz supports the following file formats for document processing:

Format Extensions Processing
PDF .pdf Full text extraction + OCR
Word .doc, .docx Full text extraction
Excel .xls, .xlsx Table and data extraction
Text .txt Direct text extraction

File size limit: - Maximum per file: 50 MB - Larger files may be rejected or process slowly

Other file types

Other file types (such as PowerPoint, images, archives) are not processed as documents. Only PDF, Word, Excel, and text files are analyzed and made searchable.

Multiple formats of the same document

If you have the same document in multiple formats (for example, both PDF and Excel or Word), add +-+ to the PDF filename to ignore it. The AI understands Excel and Word documents better than PDFs, especially for complex documents like quantity surveys. By ignoring the PDF and using only the Excel/Word version, the chatbot makes fewer errors.


Document Classifications

SiteWhiz automatically classifies your documents into different types. This classification helps the AI understand documents better and extract the right information.

Available Classifications

Classification Description Examples
price_breakdown Detailed price lists Quantity surveys, detailed quotes, progress reports
financial Financial documents Invoices, quotes, price estimates, cost estimates
contract Contractual documents Contracts, agreements, construction contracts
technical Technical documentation Calculations, specifications, standards, specifications
drawing Drawings and plans Construction drawings, floor plans, installation diagrams
survey Surveys and investigations Land surveys, soil investigations, soil reports
permit Permits and certificates Permits, approvals, inspections
safety Safety documents Risk analyses, safety plans, procedures
report Reports Site reports, minutes, progress reports
correspondence Correspondence Emails, letters, communication
administrative Administrative documents Forms, registrations

How classification works: - SiteWhiz automatically analyzes the content of each document - The AI determines the most likely type based on text, structure, and context - Classification happens during processing - You can filter documents by type on the Documents page


Cloud Storage Integration

Automatically sync documents from Dropbox or OneDrive.

Supported providers: - Dropbox (personal and business) - OneDrive (personal and business)

What syncs: - New files uploaded to selected folder - Modified files (updates in SiteWhiz) - Folder structure preserved

What doesn't sync: - Deleted files (remain in SiteWhiz) - Files outside selected folder - SiteWhiz → Cloud (one-way sync from cloud to SiteWhiz)

Detailed Instructions

For complete step-by-step instructions on cloud storage integration, including connecting, configuring and managing, see the Projects Guide →.


Email Integration

Process documents received via email automatically.

Per-User Integration

Email integration is configured per user in your user settings. Each team member connects their own Outlook account.

How it works:

  1. Connect your Outlook account (Microsoft 365) in your user settings
  2. Configure which Outlook folder to watch for each project
  3. Sort emails into that folder in Outlook
  4. AI automatically processes email content and attachments
Setting Up Email Integration

Connecting Outlook (User Settings)

  1. Go to SettingsIntegrations (your user settings, not project settings)
  2. Click Connect Outlook
  3. Login to Microsoft account
  4. Authorize SiteWhiz access
  5. Connection confirmed

Per User

Each team member must connect their own Outlook account. Email integration is user-specific, not project-wide.

Configuring Project Folder (Per Project)

  1. Go to Project SettingsEmail Configuration
  2. Enter Outlook folder path for this specific project (e.g., /Archive/Projects/Building A)
  3. Click Save

Folder path examples:

/Inbox
/Archive/Projects/Building A
/Shared Folders/Client Name/Documents

How the system works: - You connect your Outlook account once in user settings (Settings → Integrations) - For each project, you configure the FOLDER PATH in project settings - SiteWhiz automatically watches emails arriving in that folder - All emails in configured folder (and subfolders) are processed - Emails in other folders are ignored

Using Email Integration

Step 1: Receive Email

  • Get email with site report, quote, or specification
  • Email may include PDF attachments

Step 2: Sort in Outlook

  • Move email to the configured Outlook folder
  • Drag from Inbox to project folder, or use "Move to folder"
  • You do this in Outlook - NOT in SiteWhiz

Step 3: Automatic Processing

  • SiteWhiz detects the new email automatically (within a few minutes)
  • AI reads email content
  • Analyzes attachments (PDFs, images, documents)
  • Extracts action items and decisions
  • Creates suggested items in AI Extraction section

Your time savings: From 10 minutes of typing to 5 seconds of dragging.

Email Thread Processing

Full context understanding:

  • The AI reads entire email threads
  • Understands conversation flow
  • Extracts final decisions
  • Creates comprehensive items

Instead of 5 separate items, you get one item with full history.

Review Before Confirmation

Emails appear as AI suggestions first:

  1. Go to AI Extraction in your project
  2. Review suggested items
  3. Edit if needed
  4. Check relevant items
  5. Click Confirm

Privacy and Security

  • Only emails you move to project folder are processed
  • AI has no access to your entire inbox
  • Processed on secure servers
  • No data shared with third parties

Ignoring Documents

Sometimes you want SiteWhiz to not process certain documents. This can be useful for:

  • Temporary files
  • Test documents
  • Old versions you want to keep but not process

How to use:

  • Add +-+ to the filename in your cloud storage
  • For example: report.pdfreport +-+.pdf
  • SiteWhiz will automatically ignore this document during synchronization
  • The document will be marked as deleted and embeddings will be cleaned up

Examples: - quote_v1 +-+.pdf - Old version you don't want processed - test +-+ document.pdf - Test file - temporary +-+ file.docx - Temporary document

Renaming files

You can rename the file in Dropbox or OneDrive to add the +-+ pattern. SiteWhiz detects this automatically during the next synchronization.


Processing Status

Each document shows its processing status:

Status Meaning
Processing Document is currently being processed by the AI
Ready Document is fully processed and searchable
Error Processing failed (try again or contact support)

Processing time: - Small documents (< 10 pages): 30 seconds - 2 minutes - Large documents (100+ pages): 5-10 minutes - Depends on document type and complexity


Document Processing Pipeline

Understanding how SiteWhiz processes your documents.

During the project: - Reports (site reports, architect reports) → Added to your task list as action items - Manuals, plans → Make your chatbot smarter (searchable knowledge base) - Financial documents → Analyzed for budget management

5-Stage Pipeline:

Processing Stages Explained

Stage 1: Receipt

  • File detected from cloud sync or email integration
  • File received and stored securely
  • Basic validation checks file type and size
  • Document appears in list as "Processing"

Stage 2: Text Extraction

For text-based documents: - PDF text extracted directly - Word documents parsed - Excel data converted to searchable text

Best results with: - Text-based PDFs (not scanned images) - Standard fonts - Properly formatted documents

Stage 3: OCR (If Needed)

For image-based content: - Scanned pages analyzed - Text recognized using OCR - Tables and figures identified

Best quality: - 300 DPI minimum resolution - High contrast - Clean scans (no handwriting over text)

Stage 4: AI Processing

  • Content analyzed for meaning
  • Semantic embeddings created
  • Related concepts linked
  • Context understood

Stage 5: Indexing

  • Full-text search index built
  • Document marked as "Ready"
  • Content available for AI queries
  • Searchable across all fields

Quality Factors

Best results with: - Text-based documents (not scanned) - High resolution (300 DPI+ for scans) - Properly oriented (not rotated) - Clean formatting - Standard fonts

Challenging documents: - Low-quality scans - Handwritten content - Complex tables - Multi-column layouts - Unusual fonts

Supported Languages

Language Support Level
English Full
Dutch Full

Best Practices

Organization: - Use clear, descriptive filenames - Remove outdated versions or mark them with +-+ - Group related documents in folders - Keep documents current

Quality: - Use text-based PDFs when possible (not scanned images) - Ensure scans are high resolution (300 DPI+) - Use complete documents (not excerpts)

Access: - Use cloud sync for automatic updates


Troubleshooting

For detailed troubleshooting for documents, including processing, cloud sync and more, see the Troubleshooting Guide →.


Next Steps