Document Search with RAG

This tutorial demonstrates how to build a semantic document search system using Maitento’s RAG (Retrieval Augmented Generation) pipeline. You’ll learn how to create partitions, upload documents, wait for embeddings, perform semantic search, and use results in agent interactions.


Table of Contents

  1. Overview
  2. Prerequisites
  3. Step 1: Create a Partition
  4. Step 2: Upload Documents
  5. Step 3: Wait for Ingestion
  6. Step 4: Perform Semantic Search
  7. Step 5: Use Results in Agent Interactions
  8. Complete Examples
  9. Best Practices
  10. Troubleshooting

Overview

Maitento’s RAG pipeline provides a complete solution for semantic document search:

File Upload -> Text Extraction -> Chunking -> Vector Generation -> Hybrid Search
     |              |               |              |              |
  WebAPI      MarkItDown/      TextChunker     OpenAI       MongoDB
             Pandoc/etc      (1750 chars)   Embeddings    Vector Search

Supported File Types

TypeExtensionsParser
PDF.pdfMarkItDown (Python)
Word.docxPandoc
Excel.xlsxMarkItDown
PowerPoint.pptxMarkItDown
Markdown.mdPlain Text
Plain Text.txtPlain Text
HTML.html, .htmReverseMarkdown
CSV.csvMarkItDown
JSON.jsonMarkItDown
XML.xmlMarkItDown

File Size Limit: 10 MB

Processing Timeline

File SizeTime to Searchable
Small (<750 words)1-3 seconds
Medium (750-5000 words)3-8 seconds
Large (20+ chunks)10-20 seconds

Prerequisites

Before starting, ensure you have:

  • A Maitento account with API access
  • API credentials (API key or username/password)
  • Documents to upload (PDF, markdown, etc.)

Authentication Setup

Shell:

api-url-set https://api.maitento.com
api-login

SDK (C#):

// Using API Key
var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, "https://api.maitento.com");

// Or using credentials
var client = MaitentoClient.CreateWithCredentials("user@example.com", "password", "https://api.maitento.com");

API:

PUT /auth HTTP/1.1
Host: api.maitento.com
Content-Type: application/json

{
  "email": "user@example.com",
  "password": "your-password"
}

Step 1: Create a Partition

Partitions provide logical grouping for your documents. All files in a partition are searchable together, allowing you to organize documents by purpose (e.g., “Product Docs”, “Support KB”, “Legal”).

Shell

# Create a partition for product documentation
partition-create "Product Documentation"

# Output:
# Id: a1b2c3d4-e5f6-7890-abcd-ef1234567890
# Name: Product Documentation
# Created: 2024-03-15T10:30:00Z

# List all partitions
partition-list

SDK (C#)

using Maitento.Sdk;
using Maitento.Entities.Partitions;

// Create the client
var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);

// Create a partition
Partition partition = await client.Partitions.CreateAsync("Product Documentation");

Console.WriteLine($"Created partition: {partition.Id}");
Console.WriteLine($"Name: {partition.Name}");

// List all partitions
Partition[] partitions = await client.Partitions.ListAsync();
foreach (var p in partitions)
{
    Console.WriteLine($"- {p.Name} ({p.Id})");
}

API

Create Partition:

PUT /partitions HTTP/1.1
Host: api.maitento.com
Authorization: Bearer <token>
Content-Type: application/json

{
  "name": "Product Documentation"
}

Response:

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "dateCreated": "2024-03-15T10:30:00Z",
  "dateUpdated": "2024-03-15T10:30:00Z",
  "tenantId": "98765432-abcd-efgh-ijkl-mnopqrstuvwx",
  "userCreatedById": "12345678-abcd-efgh-ijkl-mnopqrstuvwx",
  "name": "Product Documentation"
}

List Partitions:

GET /partitions HTTP/1.1
Host: api.maitento.com
Authorization: Bearer <token>

Step 2: Upload Documents

Upload your documents to the partition. Maitento automatically extracts text, generates chunks for large documents, and creates vector embeddings.

Shell

# Upload a single PDF
file-upload ./installation-guide.pdf --partition-id a1b2c3d4-e5f6-7890-abcd-ef1234567890

# Output:
# File uploaded successfully
# Id: f1e2d3c4-b5a6-9087-6543-210fedcba987
# Name: installation-guide.pdf
# Type: Pdf
# Size: 524288 bytes
# Status: Processing

# Upload multiple files
file-upload ./user-manual.pdf --partition-id a1b2c3d4-e5f6-7890-abcd-ef1234567890
file-upload ./faq.md --partition-id a1b2c3d4-e5f6-7890-abcd-ef1234567890
file-upload ./release-notes.txt --partition-id a1b2c3d4-e5f6-7890-abcd-ef1234567890

SDK (C#)

using Maitento.Sdk;
using Maitento.Entities.IngestedFiles;

var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);
Guid partitionId = Guid.Parse("a1b2c3d4-e5f6-7890-abcd-ef1234567890");

// Upload a PDF file
byte[] pdfContent = await File.ReadAllBytesAsync("installation-guide.pdf");
IngestedFile pdfFile = await client.Files.IngestFileAsync(
    partitionId,
    pdfContent,
    "installation-guide.pdf",
    "application/pdf"
);
Console.WriteLine($"Uploaded: {pdfFile.Name} (ID: {pdfFile.Id})");

// Upload a Markdown file
byte[] mdContent = await File.ReadAllBytesAsync("faq.md");
IngestedFile mdFile = await client.Files.IngestFileAsync(
    partitionId,
    mdContent,
    "faq.md",
    "text/markdown"
);
Console.WriteLine($"Uploaded: {mdFile.Name} (ID: {mdFile.Id})");

// Upload multiple files from a directory
string[] supportedExtensions = { ".pdf", ".md", ".txt", ".docx" };
foreach (string filePath in Directory.GetFiles("./docs"))
{
    string ext = Path.GetExtension(filePath).ToLower();
    if (!supportedExtensions.Contains(ext)) continue;

    byte[] content = await File.ReadAllBytesAsync(filePath);
    string fileName = Path.GetFileName(filePath);
    string contentType = GetContentType(fileName);

    IngestedFile file = await client.Files.IngestFileAsync(
        partitionId,
        content,
        fileName,
        contentType
    );
    Console.WriteLine($"Uploaded: {file.Name}");
}

static string GetContentType(string fileName)
{
    var extension = Path.GetExtension(fileName).ToLowerInvariant();
    return extension switch
    {
        ".pdf" => "application/pdf",
        ".docx" => "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
        ".txt" => "text/plain",
        ".md" => "text/markdown",
        ".html" => "text/html",
        ".json" => "application/json",
        ".csv" => "text/csv",
        ".xlsx" => "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
        _ => "application/octet-stream"
    };
}

API

Upload File:

PUT /files/partitions/a1b2c3d4-e5f6-7890-abcd-ef1234567890 HTTP/1.1
Host: api.maitento.com
Authorization: Bearer <token>
Content-Type: multipart/form-data; boundary=----FormBoundary

------FormBoundary
Content-Disposition: form-data; name="file"; filename="installation-guide.pdf"
Content-Type: application/pdf

<binary file content>
------FormBoundary--

Response:

{
  "id": "f1e2d3c4-b5a6-9087-6543-210fedcba987",
  "dateCreated": "2024-03-15T14:30:00Z",
  "dateUpdated": "2024-03-15T14:30:00Z",
  "name": "installation-guide.pdf",
  "contentType": "application/pdf",
  "size": 524288,
  "type": "Pdf",
  "text": "Installation Guide\n\nChapter 1: Getting Started...",
  "isVectorGenerated": false,
  "vectorGenerationTime": null,
  "isChunkingRequired": true,
  "chunkGenerationTime": null,
  "partitionId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Check for Duplicates Before Upload

To avoid uploading duplicate files, check by checksum first:

SDK (C#):

using System.Security.Cryptography;

// Calculate file checksum
byte[] fileBytes = await File.ReadAllBytesAsync("document.pdf");
string checksum;
using (var sha256 = SHA256.Create())
{
    byte[] hash = sha256.ComputeHash(fileBytes);
    checksum = Convert.ToBase64String(hash);
}

// Check if file exists
IngestedFileSummary? existing = await client.Files.GetByChecksumAsync(checksum);
if (existing != null)
{
    Console.WriteLine($"File already exists: {existing.Name}");
}
else
{
    // Proceed with upload
    await client.Files.IngestFileAsync(partitionId, fileBytes, "document.pdf", "application/pdf");
}

API:

GET /files/by-checksum/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 HTTP/1.1
Host: api.maitento.com
Authorization: Bearer <token>

Step 3: Wait for Ingestion

After uploading, documents go through text extraction, chunking, and vector generation. You need to wait for isVectorGenerated to become true before searching.

Shell

# Check file status
file-get f1e2d3c4-b5a6-9087-6543-210fedcba987

# Output:
# Id: f1e2d3c4-b5a6-9087-6543-210fedcba987
# Name: installation-guide.pdf
# Type: Pdf
# Size: 524288 bytes
# Vectors Generated: true
# Vector Generation Time: 2024-03-15T14:35:00Z

# Subscribe to file notifications for real-time updates
notifications file

SDK (C#)

Polling Approach:

var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);
Guid fileId = Guid.Parse("f1e2d3c4-b5a6-9087-6543-210fedcba987");

// Poll until vectors are ready
IngestedFile? fileDetails;
int maxAttempts = 30;
int attempt = 0;

do
{
    fileDetails = await client.Files.GetDetailsAsync(fileId);

    if (fileDetails == null)
    {
        throw new Exception("File not found");
    }

    if (fileDetails.IsVectorGenerated)
    {
        Console.WriteLine("Vectors ready! File is searchable.");
        break;
    }

    Console.WriteLine($"Processing... (attempt {++attempt}/{maxAttempts})");
    await Task.Delay(1000); // Wait 1 second

} while (attempt < maxAttempts);

if (!fileDetails.IsVectorGenerated)
{
    Console.WriteLine("Warning: Vector generation took longer than expected");
}

Real-Time Notifications Approach:

using Maitento.Sdk;
using Maitento.Sdk.Notifications;

var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);

// Start notification service
await client.Notifications.StartAsync();

// Create a handler for file updates
var handler = new FileUpdateHandler();
var subscription = client.Notifications.Subscribe(handler, fileId);

// Upload the file
Guid partitionId = Guid.Parse("a1b2c3d4-e5f6-7890-abcd-ef1234567890");
byte[] content = await File.ReadAllBytesAsync("document.pdf");
var file = await client.Files.IngestFileAsync(partitionId, content, "document.pdf", "application/pdf");

// Wait for completion (handler sets this)
await handler.WaitForVectorGeneration(TimeSpan.FromSeconds(60));

// Cleanup
client.Notifications.Unsubscribe(subscription);

// Handler implementation
public class FileUpdateHandler : INotificationClientHandlerIngestedFileUpdated
{
    private readonly TaskCompletionSource<bool> _tcs = new();

    public Task OnIngestedFileUpdated(IngestedFile file)
    {
        Console.WriteLine($"File update: {file.Name} - Vectors: {file.IsVectorGenerated}");

        if (file.IsVectorGenerated)
        {
            _tcs.TrySetResult(true);
        }

        return Task.CompletedTask;
    }

    public async Task WaitForVectorGeneration(TimeSpan timeout)
    {
        var timeoutTask = Task.Delay(timeout);
        var completedTask = await Task.WhenAny(_tcs.Task, timeoutTask);

        if (completedTask == timeoutTask)
        {
            throw new TimeoutException("Vector generation timed out");
        }
    }
}

API

Get File Details:

GET /files/by-id/f1e2d3c4-b5a6-9087-6543-210fedcba987/details HTTP/1.1
Host: api.maitento.com
Authorization: Bearer <token>

Response (Processing):

{
  "id": "f1e2d3c4-b5a6-9087-6543-210fedcba987",
  "name": "installation-guide.pdf",
  "isVectorGenerated": false,
  "vectorGenerationTime": null,
  "isChunkingRequired": true,
  "chunkGenerationTime": null
}

Response (Complete):

{
  "id": "f1e2d3c4-b5a6-9087-6543-210fedcba987",
  "name": "installation-guide.pdf",
  "isVectorGenerated": true,
  "vectorGenerationTime": "2024-03-15T14:35:00Z",
  "isChunkingRequired": true,
  "chunkGenerationTime": "2024-03-15T14:32:00Z"
}

Once embeddings are generated, use semantic search to find relevant documents. The search uses hybrid retrieval combining vector similarity (70%) and full-text search (30%).

Shell

# Search for documents about installation
file-search "How do I install the product on Windows?" --partition-id a1b2c3d4-e5f6-7890-abcd-ef1234567890

# Output:
# Found 3 relevant documents
#
# 1. installation-guide.pdf (Score: 0.892)
#    Type: Pdf | Size: 524288 bytes
#    Match: "Windows Installation\n\nStep 1: Download the installer..."
#
# 2. quick-start.md (Score: 0.756)
#    Type: Markdown | Size: 8192 bytes
#    Match: "# Quick Start Guide\n\n## Installation\n\nFor Windows users..."
#
# 3. faq.md (Score: 0.634)
#    Type: Markdown | Size: 4096 bytes
#    Match: "## Q: What are the system requirements?\n\nA: Windows 10..."

SDK (C#)

using Maitento.Sdk;
using Maitento.Entities.IngestedFiles;

var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);
Guid partitionId = Guid.Parse("a1b2c3d4-e5f6-7890-abcd-ef1234567890");

// Perform semantic search
string query = "How do I install the product on Windows?";
EmbeddingsSearchResult[] results = await client.Files.SearchEmbeddingsAsync(partitionId, query);

Console.WriteLine($"Found {results.Length} relevant documents\n");

foreach (var result in results)
{
    Console.WriteLine($"--- {result.Name} (Score: {result.Score:F4}) ---");
    Console.WriteLine($"Type: {result.Type}");
    Console.WriteLine($"Size: {result.Size} bytes");
    Console.WriteLine($"Content-Type: {result.ContentType}");

    // Show first 200 characters of matched text
    string preview = result.Text.Length > 200
        ? result.Text.Substring(0, 200) + "..."
        : result.Text;
    Console.WriteLine($"Match: {preview}\n");
}

// Filter results by score threshold
var highQualityResults = results.Where(r => r.Score > 0.7).ToArray();
Console.WriteLine($"{highQualityResults.Length} high-quality matches (score > 0.7)");

API

Search Embeddings:

PUT /files/embeddings/search HTTP/1.1
Host: api.maitento.com
Authorization: Bearer <token>
Content-Type: application/json

{
  "query": "How do I install the product on Windows?",
  "partitionId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Response:

[
  {
    "id": "f1e2d3c4-b5a6-9087-6543-210fedcba987",
    "name": "installation-guide.pdf",
    "contentType": "application/pdf",
    "size": 524288,
    "type": "Pdf",
    "text": "Windows Installation\n\nStep 1: Download the installer from our website.\nStep 2: Run the installer as Administrator.\nStep 3: Follow the setup wizard...",
    "score": 0.892
  },
  {
    "id": "a2b3c4d5-e6f7-8901-abcd-ef2345678901",
    "name": "quick-start.md",
    "contentType": "text/markdown",
    "size": 8192,
    "type": "Markdown",
    "text": "# Quick Start Guide\n\n## Installation\n\nFor Windows users, download the MSI installer and run it with administrator privileges...",
    "score": 0.756
  }
]

Step 5: Use Results in Agent Interactions

Combine search results with AI agents to create powerful question-answering systems. The search results provide context for the agent to generate accurate, grounded responses.

Shell

# First, search for relevant documents
file-search "installation requirements" --partition-id a1b2c3d4-e5f6-7890-abcd-ef1234567890

# Then use the results in an interaction with your agent
interaction-run support-agent --question "What are the system requirements for installation?" --context "..."

SDK (C#)

using Maitento.Sdk;
using Maitento.Entities.IngestedFiles;

var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);
Guid partitionId = Guid.Parse("a1b2c3d4-e5f6-7890-abcd-ef1234567890");

// User question
string userQuestion = "What are the minimum system requirements?";

// Step 1: Search for relevant documents
var searchResults = await client.Files.SearchEmbeddingsAsync(partitionId, userQuestion);

// Step 2: Build context from top results
var topResults = searchResults.Take(3).ToArray();
string context = string.Join("\n\n---\n\n", topResults.Select(r =>
    $"Source: {r.Name}\n{r.Text}"
));

// Step 3: Create prompt with context
string systemPrompt = @"You are a helpful product support assistant.
Answer questions based ONLY on the provided context.
If the context doesn't contain the answer, say so.
Always cite your sources.";

string userPrompt = $@"Context:
{context}

Question: {userQuestion}

Please provide a helpful answer based on the documentation above.";

// Step 4: Run interaction with the context-enriched prompt
// (Assumes you have a support-agent interaction defined)
var result = await client.Interactions.RunAsync(
    "support-agent",
    new Dictionary<string, object>
    {
        ["system_context"] = systemPrompt,
        ["user_query"] = userPrompt
    }
);

Console.WriteLine("Assistant Response:");
Console.WriteLine(result.Output);

// Include source citations
Console.WriteLine("\nSources:");
foreach (var source in topResults)
{
    Console.WriteLine($"- {source.Name} (relevance: {source.Score:P0})");
}

Complete RAG Service Example

using Maitento.Sdk;
using Maitento.Entities.IngestedFiles;

public class DocumentRAGService
{
    private readonly IMaitentoClient _client;
    private readonly Guid _partitionId;
    private readonly double _scoreThreshold;
    private readonly int _maxResults;

    public DocumentRAGService(
        IMaitentoClient client,
        Guid partitionId,
        double scoreThreshold = 0.5,
        int maxResults = 5)
    {
        _client = client;
        _partitionId = partitionId;
        _scoreThreshold = scoreThreshold;
        _maxResults = maxResults;
    }

    public async Task<RAGResponse> AskAsync(string question)
    {
        // Search for relevant documents
        var searchResults = await _client.Files.SearchEmbeddingsAsync(_partitionId, question);

        // Filter by score threshold and limit results
        var relevantDocs = searchResults
            .Where(r => r.Score >= _scoreThreshold)
            .Take(_maxResults)
            .ToArray();

        if (relevantDocs.Length == 0)
        {
            return new RAGResponse
            {
                Answer = "I couldn't find any relevant information in the documentation to answer your question.",
                Sources = Array.Empty<DocumentSource>(),
                Confidence = 0
            };
        }

        // Build context
        string context = BuildContext(relevantDocs);

        // Run the interaction
        var interactionResult = await _client.Interactions.RunAsync(
            "rag-qa-agent",
            new Dictionary<string, object>
            {
                ["context"] = context,
                ["question"] = question
            }
        );

        return new RAGResponse
        {
            Answer = interactionResult.Output?.ToString() ?? "No response generated",
            Sources = relevantDocs.Select(r => new DocumentSource
            {
                Name = r.Name,
                Id = r.Id,
                Score = r.Score,
                Excerpt = r.Text.Length > 200 ? r.Text.Substring(0, 200) + "..." : r.Text
            }).ToArray(),
            Confidence = relevantDocs.Average(r => r.Score)
        };
    }

    private string BuildContext(EmbeddingsSearchResult[] results)
    {
        var sb = new StringBuilder();

        foreach (var result in results)
        {
            sb.AppendLine($"### Document: {result.Name}");
            sb.AppendLine(result.Text);
            sb.AppendLine();
            sb.AppendLine("---");
            sb.AppendLine();
        }

        return sb.ToString();
    }
}

public class RAGResponse
{
    public string Answer { get; set; } = string.Empty;
    public DocumentSource[] Sources { get; set; } = Array.Empty<DocumentSource>();
    public double Confidence { get; set; }
}

public class DocumentSource
{
    public string Name { get; set; } = string.Empty;
    public Guid Id { get; set; }
    public double Score { get; set; }
    public string Excerpt { get; set; } = string.Empty;
}

Usage:

var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);
var ragService = new DocumentRAGService(
    client,
    partitionId: Guid.Parse("a1b2c3d4-e5f6-7890-abcd-ef1234567890"),
    scoreThreshold: 0.6,
    maxResults: 3
);

// Ask questions
var response = await ragService.AskAsync("What are the system requirements?");

Console.WriteLine($"Answer: {response.Answer}");
Console.WriteLine($"Confidence: {response.Confidence:P0}");
Console.WriteLine("\nSources:");
foreach (var source in response.Sources)
{
    Console.WriteLine($"- {source.Name} (score: {source.Score:F2})");
}

Complete Examples

Shell: End-to-End Workflow

#!/bin/bash
# Complete RAG setup and search workflow

# 1. Configure and authenticate
api-url-set https://api.maitento.com
api-login

# 2. Create a partition for documentation
partition-create "Knowledge Base"
# Note the partition ID from output

PARTITION_ID="a1b2c3d4-e5f6-7890-abcd-ef1234567890"

# 3. Upload all documentation files
for file in ./docs/*.pdf ./docs/*.md ./docs/*.txt; do
    if [ -f "$file" ]; then
        echo "Uploading: $file"
        file-upload "$file" --partition-id $PARTITION_ID
    fi
done

# 4. Wait for processing (monitor notifications)
echo "Waiting for files to be processed..."
sleep 10  # Simple wait, or use: notifications file

# 5. Search the knowledge base
file-search "How do I configure authentication?" --partition-id $PARTITION_ID
file-search "What are the API rate limits?" --partition-id $PARTITION_ID
file-search "troubleshooting common errors" --partition-id $PARTITION_ID

SDK (C#): Full Application

using System;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
using Maitento.Sdk;
using Maitento.Entities.IngestedFiles;
using Maitento.Entities.Partitions;

public class DocumentSearchApplication
{
    private readonly IMaitentoClient _client;

    public DocumentSearchApplication(string clientId, string clientSecret, string baseUrl)
    {
        _client = MaitentoClient.CreateWithApiKey(
            Guid.Parse(clientId),
            clientSecret,
            baseUrl
        );
    }

    public async Task<Guid> SetupKnowledgeBaseAsync(string name, string docsDirectory)
    {
        // Create partition
        Console.WriteLine($"Creating partition: {name}");
        var partition = await _client.Partitions.CreateAsync(name);
        Console.WriteLine($"Created partition: {partition.Id}");

        // Upload all supported files
        var supportedExtensions = new[] { ".pdf", ".md", ".txt", ".docx", ".html" };
        var files = Directory.GetFiles(docsDirectory)
            .Where(f => supportedExtensions.Contains(Path.GetExtension(f).ToLower()));

        var uploadedFiles = new List<IngestedFile>();

        foreach (var filePath in files)
        {
            Console.WriteLine($"Uploading: {Path.GetFileName(filePath)}");

            var content = await File.ReadAllBytesAsync(filePath);
            var fileName = Path.GetFileName(filePath);
            var contentType = GetContentType(fileName);

            var file = await _client.Files.IngestFileAsync(
                partition.Id,
                content,
                fileName,
                contentType
            );

            uploadedFiles.Add(file);
        }

        // Wait for all files to be processed
        Console.WriteLine("Waiting for vector generation...");
        await WaitForProcessingAsync(uploadedFiles.Select(f => f.Id).ToArray());

        Console.WriteLine("Knowledge base ready!");
        return partition.Id;
    }

    public async Task<SearchResultDisplay[]> SearchAsync(Guid partitionId, string query)
    {
        var results = await _client.Files.SearchEmbeddingsAsync(partitionId, query);

        return results.Select(r => new SearchResultDisplay
        {
            FileName = r.Name,
            FileId = r.Id,
            Score = r.Score,
            MatchedText = r.Text,
            FileType = r.Type.ToString()
        }).ToArray();
    }

    private async Task WaitForProcessingAsync(Guid[] fileIds, int timeoutSeconds = 120)
    {
        var startTime = DateTime.UtcNow;
        var pendingFiles = new HashSet<Guid>(fileIds);

        while (pendingFiles.Count > 0)
        {
            if ((DateTime.UtcNow - startTime).TotalSeconds > timeoutSeconds)
            {
                throw new TimeoutException(
                    $"Timeout waiting for {pendingFiles.Count} files to process"
                );
            }

            foreach (var fileId in pendingFiles.ToArray())
            {
                var details = await _client.Files.GetDetailsAsync(fileId);
                if (details?.IsVectorGenerated == true)
                {
                    Console.WriteLine($"  Processed: {details.Name}");
                    pendingFiles.Remove(fileId);
                }
            }

            if (pendingFiles.Count > 0)
            {
                await Task.Delay(2000);
            }
        }
    }

    private static string GetContentType(string fileName)
    {
        var ext = Path.GetExtension(fileName).ToLowerInvariant();
        return ext switch
        {
            ".pdf" => "application/pdf",
            ".docx" => "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
            ".txt" => "text/plain",
            ".md" => "text/markdown",
            ".html" => "text/html",
            _ => "application/octet-stream"
        };
    }
}

public class SearchResultDisplay
{
    public string FileName { get; set; } = string.Empty;
    public Guid FileId { get; set; }
    public double Score { get; set; }
    public string MatchedText { get; set; } = string.Empty;
    public string FileType { get; set; } = string.Empty;

    public override string ToString()
    {
        var preview = MatchedText.Length > 150
            ? MatchedText.Substring(0, 150) + "..."
            : MatchedText;
        return $"{FileName} ({FileType}) - Score: {Score:F3}\n  {preview}";
    }
}

// Usage
public class Program
{
    public static async Task Main(string[] args)
    {
        var app = new DocumentSearchApplication(
            clientId: "your-client-id",
            clientSecret: "your-client-secret",
            baseUrl: "https://api.maitento.com"
        );

        // Setup knowledge base
        var partitionId = await app.SetupKnowledgeBaseAsync(
            "Product Documentation",
            "./docs"
        );

        // Interactive search
        while (true)
        {
            Console.Write("\nSearch query (or 'quit'): ");
            var query = Console.ReadLine();

            if (string.IsNullOrEmpty(query) || query.ToLower() == "quit")
                break;

            var results = await app.SearchAsync(partitionId, query);

            Console.WriteLine($"\nFound {results.Length} results:\n");
            foreach (var result in results)
            {
                Console.WriteLine(result);
                Console.WriteLine();
            }
        }
    }
}

API: cURL Examples

#!/bin/bash
# Complete API workflow using cURL

BASE_URL="https://api.maitento.com"
TOKEN="your-jwt-token"

# 1. Create a partition
PARTITION_RESPONSE=$(curl -s -X PUT "$BASE_URL/partitions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "Product Documentation"}')

PARTITION_ID=$(echo $PARTITION_RESPONSE | jq -r '.id')
echo "Created partition: $PARTITION_ID"

# 2. Upload a file
curl -X PUT "$BASE_URL/files/partitions/$PARTITION_ID" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@./installation-guide.pdf"

# 3. Check processing status
FILE_ID="f1e2d3c4-b5a6-9087-6543-210fedcba987"
while true; do
  STATUS=$(curl -s "$BASE_URL/files/by-id/$FILE_ID/details" \
    -H "Authorization: Bearer $TOKEN" | jq -r '.isVectorGenerated')

  if [ "$STATUS" = "true" ]; then
    echo "File ready for search!"
    break
  fi

  echo "Processing..."
  sleep 2
done

# 4. Perform semantic search
curl -X PUT "$BASE_URL/files/embeddings/search" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"query\": \"How do I install on Windows?\",
    \"partitionId\": \"$PARTITION_ID\"
  }" | jq '.'

Best Practices

Partition Organization

  • Group related documents: Keep documents that should be searched together in the same partition
  • Use descriptive names: “Customer Support KB” is better than “Partition 1”
  • Consider access patterns: Create separate partitions for different user groups or use cases

File Preparation

  • Keep files under 10 MB: Split very large documents if needed
  • Use text-rich formats: PDF, Markdown, and plain text work best
  • Include metadata: Descriptive filenames help with result interpretation

Search Optimization

  • Use natural language queries: “How do I configure authentication?” works better than “authentication configuration”
  • Filter by score: Results with scores below 0.5 are often not relevant
  • Limit result count: Top 3-5 results usually contain the best matches

RAG Integration

  • Always cite sources: Include document names in generated responses
  • Set confidence thresholds: Don’t present low-confidence answers as authoritative
  • Handle no-results gracefully: Tell users when documentation doesn’t cover their question

Troubleshooting

Files Not Becoming Searchable

Symptoms: isVectorGenerated stays false

Solutions:

  1. Check file format is supported
  2. Verify file is under 10 MB size limit
  3. Check the file contains extractable text (not scanned images)
  4. Wait longer for large documents (up to 20 seconds for very large files)

Low Search Relevance

Symptoms: Search returns unrelated documents or low scores

Solutions:

  1. Rephrase queries using natural language
  2. Ensure documents contain the information you’re searching for
  3. Check that documents have finished processing (isVectorGenerated: true)
  4. Try more specific queries

Upload Errors

Symptoms: 400 Bad Request or validation errors

Solutions:

  1. Verify partition ID is valid and exists
  2. Check file format is in the supported list
  3. Ensure file is under 10 MB
  4. Confirm Content-Type header matches file format

Authentication Issues

Symptoms: 401 Unauthorized or 403 Forbidden

Solutions:

  1. Verify API key or token is valid and not expired
  2. Check you have the required role (Tenant.Admin or Tenant.FileIngestion)
  3. Ensure the partition belongs to your tenant