Document Search with RAG
This tutorial demonstrates how to build a semantic document search system using Maitento’s RAG (Retrieval Augmented Generation) pipeline. You’ll learn how to create partitions, upload documents, wait for embeddings, perform semantic search, and use results in agent interactions.
Table of Contents
- Overview
- Prerequisites
- Step 1: Create a Partition
- Step 2: Upload Documents
- Step 3: Wait for Ingestion
- Step 4: Perform Semantic Search
- Step 5: Use Results in Agent Interactions
- Complete Examples
- Best Practices
- Troubleshooting
Overview
Maitento’s RAG pipeline provides a complete solution for semantic document search:
File Upload -> Text Extraction -> Chunking -> Vector Generation -> Hybrid Search
| | | | |
WebAPI MarkItDown/ TextChunker OpenAI MongoDB
Pandoc/etc (1750 chars) Embeddings Vector Search
Supported File Types
| Type | Extensions | Parser |
|---|---|---|
| MarkItDown (Python) | ||
| Word | .docx | Pandoc |
| Excel | .xlsx | MarkItDown |
| PowerPoint | .pptx | MarkItDown |
| Markdown | .md | Plain Text |
| Plain Text | .txt | Plain Text |
| HTML | .html, .htm | ReverseMarkdown |
| CSV | .csv | MarkItDown |
| JSON | .json | MarkItDown |
| XML | .xml | MarkItDown |
File Size Limit: 10 MB
Processing Timeline
| File Size | Time to Searchable |
|---|---|
| Small (<750 words) | 1-3 seconds |
| Medium (750-5000 words) | 3-8 seconds |
| Large (20+ chunks) | 10-20 seconds |
Prerequisites
Before starting, ensure you have:
- A Maitento account with API access
- API credentials (API key or username/password)
- Documents to upload (PDF, markdown, etc.)
Authentication Setup
Shell:
api-url-set https://api.maitento.com
api-login
SDK (C#):
// Using API Key
var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, "https://api.maitento.com");
// Or using credentials
var client = MaitentoClient.CreateWithCredentials("user@example.com", "password", "https://api.maitento.com");
API:
PUT /auth HTTP/1.1
Host: api.maitento.com
Content-Type: application/json
{
"email": "user@example.com",
"password": "your-password"
}
Step 1: Create a Partition
Partitions provide logical grouping for your documents. All files in a partition are searchable together, allowing you to organize documents by purpose (e.g., “Product Docs”, “Support KB”, “Legal”).
Shell
# Create a partition for product documentation
partition-create "Product Documentation"
# Output:
# Id: a1b2c3d4-e5f6-7890-abcd-ef1234567890
# Name: Product Documentation
# Created: 2024-03-15T10:30:00Z
# List all partitions
partition-list
SDK (C#)
using Maitento.Sdk;
using Maitento.Entities.Partitions;
// Create the client
var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);
// Create a partition
Partition partition = await client.Partitions.CreateAsync("Product Documentation");
Console.WriteLine($"Created partition: {partition.Id}");
Console.WriteLine($"Name: {partition.Name}");
// List all partitions
Partition[] partitions = await client.Partitions.ListAsync();
foreach (var p in partitions)
{
Console.WriteLine($"- {p.Name} ({p.Id})");
}
API
Create Partition:
PUT /partitions HTTP/1.1
Host: api.maitento.com
Authorization: Bearer <token>
Content-Type: application/json
{
"name": "Product Documentation"
}
Response:
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"dateCreated": "2024-03-15T10:30:00Z",
"dateUpdated": "2024-03-15T10:30:00Z",
"tenantId": "98765432-abcd-efgh-ijkl-mnopqrstuvwx",
"userCreatedById": "12345678-abcd-efgh-ijkl-mnopqrstuvwx",
"name": "Product Documentation"
}
List Partitions:
GET /partitions HTTP/1.1
Host: api.maitento.com
Authorization: Bearer <token>
Step 2: Upload Documents
Upload your documents to the partition. Maitento automatically extracts text, generates chunks for large documents, and creates vector embeddings.
Shell
# Upload a single PDF
file-upload ./installation-guide.pdf --partition-id a1b2c3d4-e5f6-7890-abcd-ef1234567890
# Output:
# File uploaded successfully
# Id: f1e2d3c4-b5a6-9087-6543-210fedcba987
# Name: installation-guide.pdf
# Type: Pdf
# Size: 524288 bytes
# Status: Processing
# Upload multiple files
file-upload ./user-manual.pdf --partition-id a1b2c3d4-e5f6-7890-abcd-ef1234567890
file-upload ./faq.md --partition-id a1b2c3d4-e5f6-7890-abcd-ef1234567890
file-upload ./release-notes.txt --partition-id a1b2c3d4-e5f6-7890-abcd-ef1234567890
SDK (C#)
using Maitento.Sdk;
using Maitento.Entities.IngestedFiles;
var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);
Guid partitionId = Guid.Parse("a1b2c3d4-e5f6-7890-abcd-ef1234567890");
// Upload a PDF file
byte[] pdfContent = await File.ReadAllBytesAsync("installation-guide.pdf");
IngestedFile pdfFile = await client.Files.IngestFileAsync(
partitionId,
pdfContent,
"installation-guide.pdf",
"application/pdf"
);
Console.WriteLine($"Uploaded: {pdfFile.Name} (ID: {pdfFile.Id})");
// Upload a Markdown file
byte[] mdContent = await File.ReadAllBytesAsync("faq.md");
IngestedFile mdFile = await client.Files.IngestFileAsync(
partitionId,
mdContent,
"faq.md",
"text/markdown"
);
Console.WriteLine($"Uploaded: {mdFile.Name} (ID: {mdFile.Id})");
// Upload multiple files from a directory
string[] supportedExtensions = { ".pdf", ".md", ".txt", ".docx" };
foreach (string filePath in Directory.GetFiles("./docs"))
{
string ext = Path.GetExtension(filePath).ToLower();
if (!supportedExtensions.Contains(ext)) continue;
byte[] content = await File.ReadAllBytesAsync(filePath);
string fileName = Path.GetFileName(filePath);
string contentType = GetContentType(fileName);
IngestedFile file = await client.Files.IngestFileAsync(
partitionId,
content,
fileName,
contentType
);
Console.WriteLine($"Uploaded: {file.Name}");
}
static string GetContentType(string fileName)
{
var extension = Path.GetExtension(fileName).ToLowerInvariant();
return extension switch
{
".pdf" => "application/pdf",
".docx" => "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
".txt" => "text/plain",
".md" => "text/markdown",
".html" => "text/html",
".json" => "application/json",
".csv" => "text/csv",
".xlsx" => "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
_ => "application/octet-stream"
};
}
API
Upload File:
PUT /files/partitions/a1b2c3d4-e5f6-7890-abcd-ef1234567890 HTTP/1.1
Host: api.maitento.com
Authorization: Bearer <token>
Content-Type: multipart/form-data; boundary=----FormBoundary
------FormBoundary
Content-Disposition: form-data; name="file"; filename="installation-guide.pdf"
Content-Type: application/pdf
<binary file content>
------FormBoundary--
Response:
{
"id": "f1e2d3c4-b5a6-9087-6543-210fedcba987",
"dateCreated": "2024-03-15T14:30:00Z",
"dateUpdated": "2024-03-15T14:30:00Z",
"name": "installation-guide.pdf",
"contentType": "application/pdf",
"size": 524288,
"type": "Pdf",
"text": "Installation Guide\n\nChapter 1: Getting Started...",
"isVectorGenerated": false,
"vectorGenerationTime": null,
"isChunkingRequired": true,
"chunkGenerationTime": null,
"partitionId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
Check for Duplicates Before Upload
To avoid uploading duplicate files, check by checksum first:
SDK (C#):
using System.Security.Cryptography;
// Calculate file checksum
byte[] fileBytes = await File.ReadAllBytesAsync("document.pdf");
string checksum;
using (var sha256 = SHA256.Create())
{
byte[] hash = sha256.ComputeHash(fileBytes);
checksum = Convert.ToBase64String(hash);
}
// Check if file exists
IngestedFileSummary? existing = await client.Files.GetByChecksumAsync(checksum);
if (existing != null)
{
Console.WriteLine($"File already exists: {existing.Name}");
}
else
{
// Proceed with upload
await client.Files.IngestFileAsync(partitionId, fileBytes, "document.pdf", "application/pdf");
}
API:
GET /files/by-checksum/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 HTTP/1.1
Host: api.maitento.com
Authorization: Bearer <token>
Step 3: Wait for Ingestion
After uploading, documents go through text extraction, chunking, and vector generation. You need to wait for isVectorGenerated to become true before searching.
Shell
# Check file status
file-get f1e2d3c4-b5a6-9087-6543-210fedcba987
# Output:
# Id: f1e2d3c4-b5a6-9087-6543-210fedcba987
# Name: installation-guide.pdf
# Type: Pdf
# Size: 524288 bytes
# Vectors Generated: true
# Vector Generation Time: 2024-03-15T14:35:00Z
# Subscribe to file notifications for real-time updates
notifications file
SDK (C#)
Polling Approach:
var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);
Guid fileId = Guid.Parse("f1e2d3c4-b5a6-9087-6543-210fedcba987");
// Poll until vectors are ready
IngestedFile? fileDetails;
int maxAttempts = 30;
int attempt = 0;
do
{
fileDetails = await client.Files.GetDetailsAsync(fileId);
if (fileDetails == null)
{
throw new Exception("File not found");
}
if (fileDetails.IsVectorGenerated)
{
Console.WriteLine("Vectors ready! File is searchable.");
break;
}
Console.WriteLine($"Processing... (attempt {++attempt}/{maxAttempts})");
await Task.Delay(1000); // Wait 1 second
} while (attempt < maxAttempts);
if (!fileDetails.IsVectorGenerated)
{
Console.WriteLine("Warning: Vector generation took longer than expected");
}
Real-Time Notifications Approach:
using Maitento.Sdk;
using Maitento.Sdk.Notifications;
var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);
// Start notification service
await client.Notifications.StartAsync();
// Create a handler for file updates
var handler = new FileUpdateHandler();
var subscription = client.Notifications.Subscribe(handler, fileId);
// Upload the file
Guid partitionId = Guid.Parse("a1b2c3d4-e5f6-7890-abcd-ef1234567890");
byte[] content = await File.ReadAllBytesAsync("document.pdf");
var file = await client.Files.IngestFileAsync(partitionId, content, "document.pdf", "application/pdf");
// Wait for completion (handler sets this)
await handler.WaitForVectorGeneration(TimeSpan.FromSeconds(60));
// Cleanup
client.Notifications.Unsubscribe(subscription);
// Handler implementation
public class FileUpdateHandler : INotificationClientHandlerIngestedFileUpdated
{
private readonly TaskCompletionSource<bool> _tcs = new();
public Task OnIngestedFileUpdated(IngestedFile file)
{
Console.WriteLine($"File update: {file.Name} - Vectors: {file.IsVectorGenerated}");
if (file.IsVectorGenerated)
{
_tcs.TrySetResult(true);
}
return Task.CompletedTask;
}
public async Task WaitForVectorGeneration(TimeSpan timeout)
{
var timeoutTask = Task.Delay(timeout);
var completedTask = await Task.WhenAny(_tcs.Task, timeoutTask);
if (completedTask == timeoutTask)
{
throw new TimeoutException("Vector generation timed out");
}
}
}
API
Get File Details:
GET /files/by-id/f1e2d3c4-b5a6-9087-6543-210fedcba987/details HTTP/1.1
Host: api.maitento.com
Authorization: Bearer <token>
Response (Processing):
{
"id": "f1e2d3c4-b5a6-9087-6543-210fedcba987",
"name": "installation-guide.pdf",
"isVectorGenerated": false,
"vectorGenerationTime": null,
"isChunkingRequired": true,
"chunkGenerationTime": null
}
Response (Complete):
{
"id": "f1e2d3c4-b5a6-9087-6543-210fedcba987",
"name": "installation-guide.pdf",
"isVectorGenerated": true,
"vectorGenerationTime": "2024-03-15T14:35:00Z",
"isChunkingRequired": true,
"chunkGenerationTime": "2024-03-15T14:32:00Z"
}
Step 4: Perform Semantic Search
Once embeddings are generated, use semantic search to find relevant documents. The search uses hybrid retrieval combining vector similarity (70%) and full-text search (30%).
Shell
# Search for documents about installation
file-search "How do I install the product on Windows?" --partition-id a1b2c3d4-e5f6-7890-abcd-ef1234567890
# Output:
# Found 3 relevant documents
#
# 1. installation-guide.pdf (Score: 0.892)
# Type: Pdf | Size: 524288 bytes
# Match: "Windows Installation\n\nStep 1: Download the installer..."
#
# 2. quick-start.md (Score: 0.756)
# Type: Markdown | Size: 8192 bytes
# Match: "# Quick Start Guide\n\n## Installation\n\nFor Windows users..."
#
# 3. faq.md (Score: 0.634)
# Type: Markdown | Size: 4096 bytes
# Match: "## Q: What are the system requirements?\n\nA: Windows 10..."
SDK (C#)
using Maitento.Sdk;
using Maitento.Entities.IngestedFiles;
var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);
Guid partitionId = Guid.Parse("a1b2c3d4-e5f6-7890-abcd-ef1234567890");
// Perform semantic search
string query = "How do I install the product on Windows?";
EmbeddingsSearchResult[] results = await client.Files.SearchEmbeddingsAsync(partitionId, query);
Console.WriteLine($"Found {results.Length} relevant documents\n");
foreach (var result in results)
{
Console.WriteLine($"--- {result.Name} (Score: {result.Score:F4}) ---");
Console.WriteLine($"Type: {result.Type}");
Console.WriteLine($"Size: {result.Size} bytes");
Console.WriteLine($"Content-Type: {result.ContentType}");
// Show first 200 characters of matched text
string preview = result.Text.Length > 200
? result.Text.Substring(0, 200) + "..."
: result.Text;
Console.WriteLine($"Match: {preview}\n");
}
// Filter results by score threshold
var highQualityResults = results.Where(r => r.Score > 0.7).ToArray();
Console.WriteLine($"{highQualityResults.Length} high-quality matches (score > 0.7)");
API
Search Embeddings:
PUT /files/embeddings/search HTTP/1.1
Host: api.maitento.com
Authorization: Bearer <token>
Content-Type: application/json
{
"query": "How do I install the product on Windows?",
"partitionId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
Response:
[
{
"id": "f1e2d3c4-b5a6-9087-6543-210fedcba987",
"name": "installation-guide.pdf",
"contentType": "application/pdf",
"size": 524288,
"type": "Pdf",
"text": "Windows Installation\n\nStep 1: Download the installer from our website.\nStep 2: Run the installer as Administrator.\nStep 3: Follow the setup wizard...",
"score": 0.892
},
{
"id": "a2b3c4d5-e6f7-8901-abcd-ef2345678901",
"name": "quick-start.md",
"contentType": "text/markdown",
"size": 8192,
"type": "Markdown",
"text": "# Quick Start Guide\n\n## Installation\n\nFor Windows users, download the MSI installer and run it with administrator privileges...",
"score": 0.756
}
]
Step 5: Use Results in Agent Interactions
Combine search results with AI agents to create powerful question-answering systems. The search results provide context for the agent to generate accurate, grounded responses.
Shell
# First, search for relevant documents
file-search "installation requirements" --partition-id a1b2c3d4-e5f6-7890-abcd-ef1234567890
# Then use the results in an interaction with your agent
interaction-run support-agent --question "What are the system requirements for installation?" --context "..."
SDK (C#)
using Maitento.Sdk;
using Maitento.Entities.IngestedFiles;
var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);
Guid partitionId = Guid.Parse("a1b2c3d4-e5f6-7890-abcd-ef1234567890");
// User question
string userQuestion = "What are the minimum system requirements?";
// Step 1: Search for relevant documents
var searchResults = await client.Files.SearchEmbeddingsAsync(partitionId, userQuestion);
// Step 2: Build context from top results
var topResults = searchResults.Take(3).ToArray();
string context = string.Join("\n\n---\n\n", topResults.Select(r =>
$"Source: {r.Name}\n{r.Text}"
));
// Step 3: Create prompt with context
string systemPrompt = @"You are a helpful product support assistant.
Answer questions based ONLY on the provided context.
If the context doesn't contain the answer, say so.
Always cite your sources.";
string userPrompt = $@"Context:
{context}
Question: {userQuestion}
Please provide a helpful answer based on the documentation above.";
// Step 4: Run interaction with the context-enriched prompt
// (Assumes you have a support-agent interaction defined)
var result = await client.Interactions.RunAsync(
"support-agent",
new Dictionary<string, object>
{
["system_context"] = systemPrompt,
["user_query"] = userPrompt
}
);
Console.WriteLine("Assistant Response:");
Console.WriteLine(result.Output);
// Include source citations
Console.WriteLine("\nSources:");
foreach (var source in topResults)
{
Console.WriteLine($"- {source.Name} (relevance: {source.Score:P0})");
}
Complete RAG Service Example
using Maitento.Sdk;
using Maitento.Entities.IngestedFiles;
public class DocumentRAGService
{
private readonly IMaitentoClient _client;
private readonly Guid _partitionId;
private readonly double _scoreThreshold;
private readonly int _maxResults;
public DocumentRAGService(
IMaitentoClient client,
Guid partitionId,
double scoreThreshold = 0.5,
int maxResults = 5)
{
_client = client;
_partitionId = partitionId;
_scoreThreshold = scoreThreshold;
_maxResults = maxResults;
}
public async Task<RAGResponse> AskAsync(string question)
{
// Search for relevant documents
var searchResults = await _client.Files.SearchEmbeddingsAsync(_partitionId, question);
// Filter by score threshold and limit results
var relevantDocs = searchResults
.Where(r => r.Score >= _scoreThreshold)
.Take(_maxResults)
.ToArray();
if (relevantDocs.Length == 0)
{
return new RAGResponse
{
Answer = "I couldn't find any relevant information in the documentation to answer your question.",
Sources = Array.Empty<DocumentSource>(),
Confidence = 0
};
}
// Build context
string context = BuildContext(relevantDocs);
// Run the interaction
var interactionResult = await _client.Interactions.RunAsync(
"rag-qa-agent",
new Dictionary<string, object>
{
["context"] = context,
["question"] = question
}
);
return new RAGResponse
{
Answer = interactionResult.Output?.ToString() ?? "No response generated",
Sources = relevantDocs.Select(r => new DocumentSource
{
Name = r.Name,
Id = r.Id,
Score = r.Score,
Excerpt = r.Text.Length > 200 ? r.Text.Substring(0, 200) + "..." : r.Text
}).ToArray(),
Confidence = relevantDocs.Average(r => r.Score)
};
}
private string BuildContext(EmbeddingsSearchResult[] results)
{
var sb = new StringBuilder();
foreach (var result in results)
{
sb.AppendLine($"### Document: {result.Name}");
sb.AppendLine(result.Text);
sb.AppendLine();
sb.AppendLine("---");
sb.AppendLine();
}
return sb.ToString();
}
}
public class RAGResponse
{
public string Answer { get; set; } = string.Empty;
public DocumentSource[] Sources { get; set; } = Array.Empty<DocumentSource>();
public double Confidence { get; set; }
}
public class DocumentSource
{
public string Name { get; set; } = string.Empty;
public Guid Id { get; set; }
public double Score { get; set; }
public string Excerpt { get; set; } = string.Empty;
}
Usage:
var client = MaitentoClient.CreateWithApiKey(clientId, clientSecret, baseUrl);
var ragService = new DocumentRAGService(
client,
partitionId: Guid.Parse("a1b2c3d4-e5f6-7890-abcd-ef1234567890"),
scoreThreshold: 0.6,
maxResults: 3
);
// Ask questions
var response = await ragService.AskAsync("What are the system requirements?");
Console.WriteLine($"Answer: {response.Answer}");
Console.WriteLine($"Confidence: {response.Confidence:P0}");
Console.WriteLine("\nSources:");
foreach (var source in response.Sources)
{
Console.WriteLine($"- {source.Name} (score: {source.Score:F2})");
}
Complete Examples
Shell: End-to-End Workflow
#!/bin/bash
# Complete RAG setup and search workflow
# 1. Configure and authenticate
api-url-set https://api.maitento.com
api-login
# 2. Create a partition for documentation
partition-create "Knowledge Base"
# Note the partition ID from output
PARTITION_ID="a1b2c3d4-e5f6-7890-abcd-ef1234567890"
# 3. Upload all documentation files
for file in ./docs/*.pdf ./docs/*.md ./docs/*.txt; do
if [ -f "$file" ]; then
echo "Uploading: $file"
file-upload "$file" --partition-id $PARTITION_ID
fi
done
# 4. Wait for processing (monitor notifications)
echo "Waiting for files to be processed..."
sleep 10 # Simple wait, or use: notifications file
# 5. Search the knowledge base
file-search "How do I configure authentication?" --partition-id $PARTITION_ID
file-search "What are the API rate limits?" --partition-id $PARTITION_ID
file-search "troubleshooting common errors" --partition-id $PARTITION_ID
SDK (C#): Full Application
using System;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
using Maitento.Sdk;
using Maitento.Entities.IngestedFiles;
using Maitento.Entities.Partitions;
public class DocumentSearchApplication
{
private readonly IMaitentoClient _client;
public DocumentSearchApplication(string clientId, string clientSecret, string baseUrl)
{
_client = MaitentoClient.CreateWithApiKey(
Guid.Parse(clientId),
clientSecret,
baseUrl
);
}
public async Task<Guid> SetupKnowledgeBaseAsync(string name, string docsDirectory)
{
// Create partition
Console.WriteLine($"Creating partition: {name}");
var partition = await _client.Partitions.CreateAsync(name);
Console.WriteLine($"Created partition: {partition.Id}");
// Upload all supported files
var supportedExtensions = new[] { ".pdf", ".md", ".txt", ".docx", ".html" };
var files = Directory.GetFiles(docsDirectory)
.Where(f => supportedExtensions.Contains(Path.GetExtension(f).ToLower()));
var uploadedFiles = new List<IngestedFile>();
foreach (var filePath in files)
{
Console.WriteLine($"Uploading: {Path.GetFileName(filePath)}");
var content = await File.ReadAllBytesAsync(filePath);
var fileName = Path.GetFileName(filePath);
var contentType = GetContentType(fileName);
var file = await _client.Files.IngestFileAsync(
partition.Id,
content,
fileName,
contentType
);
uploadedFiles.Add(file);
}
// Wait for all files to be processed
Console.WriteLine("Waiting for vector generation...");
await WaitForProcessingAsync(uploadedFiles.Select(f => f.Id).ToArray());
Console.WriteLine("Knowledge base ready!");
return partition.Id;
}
public async Task<SearchResultDisplay[]> SearchAsync(Guid partitionId, string query)
{
var results = await _client.Files.SearchEmbeddingsAsync(partitionId, query);
return results.Select(r => new SearchResultDisplay
{
FileName = r.Name,
FileId = r.Id,
Score = r.Score,
MatchedText = r.Text,
FileType = r.Type.ToString()
}).ToArray();
}
private async Task WaitForProcessingAsync(Guid[] fileIds, int timeoutSeconds = 120)
{
var startTime = DateTime.UtcNow;
var pendingFiles = new HashSet<Guid>(fileIds);
while (pendingFiles.Count > 0)
{
if ((DateTime.UtcNow - startTime).TotalSeconds > timeoutSeconds)
{
throw new TimeoutException(
$"Timeout waiting for {pendingFiles.Count} files to process"
);
}
foreach (var fileId in pendingFiles.ToArray())
{
var details = await _client.Files.GetDetailsAsync(fileId);
if (details?.IsVectorGenerated == true)
{
Console.WriteLine($" Processed: {details.Name}");
pendingFiles.Remove(fileId);
}
}
if (pendingFiles.Count > 0)
{
await Task.Delay(2000);
}
}
}
private static string GetContentType(string fileName)
{
var ext = Path.GetExtension(fileName).ToLowerInvariant();
return ext switch
{
".pdf" => "application/pdf",
".docx" => "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
".txt" => "text/plain",
".md" => "text/markdown",
".html" => "text/html",
_ => "application/octet-stream"
};
}
}
public class SearchResultDisplay
{
public string FileName { get; set; } = string.Empty;
public Guid FileId { get; set; }
public double Score { get; set; }
public string MatchedText { get; set; } = string.Empty;
public string FileType { get; set; } = string.Empty;
public override string ToString()
{
var preview = MatchedText.Length > 150
? MatchedText.Substring(0, 150) + "..."
: MatchedText;
return $"{FileName} ({FileType}) - Score: {Score:F3}\n {preview}";
}
}
// Usage
public class Program
{
public static async Task Main(string[] args)
{
var app = new DocumentSearchApplication(
clientId: "your-client-id",
clientSecret: "your-client-secret",
baseUrl: "https://api.maitento.com"
);
// Setup knowledge base
var partitionId = await app.SetupKnowledgeBaseAsync(
"Product Documentation",
"./docs"
);
// Interactive search
while (true)
{
Console.Write("\nSearch query (or 'quit'): ");
var query = Console.ReadLine();
if (string.IsNullOrEmpty(query) || query.ToLower() == "quit")
break;
var results = await app.SearchAsync(partitionId, query);
Console.WriteLine($"\nFound {results.Length} results:\n");
foreach (var result in results)
{
Console.WriteLine(result);
Console.WriteLine();
}
}
}
}
API: cURL Examples
#!/bin/bash
# Complete API workflow using cURL
BASE_URL="https://api.maitento.com"
TOKEN="your-jwt-token"
# 1. Create a partition
PARTITION_RESPONSE=$(curl -s -X PUT "$BASE_URL/partitions" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "Product Documentation"}')
PARTITION_ID=$(echo $PARTITION_RESPONSE | jq -r '.id')
echo "Created partition: $PARTITION_ID"
# 2. Upload a file
curl -X PUT "$BASE_URL/files/partitions/$PARTITION_ID" \
-H "Authorization: Bearer $TOKEN" \
-F "file=@./installation-guide.pdf"
# 3. Check processing status
FILE_ID="f1e2d3c4-b5a6-9087-6543-210fedcba987"
while true; do
STATUS=$(curl -s "$BASE_URL/files/by-id/$FILE_ID/details" \
-H "Authorization: Bearer $TOKEN" | jq -r '.isVectorGenerated')
if [ "$STATUS" = "true" ]; then
echo "File ready for search!"
break
fi
echo "Processing..."
sleep 2
done
# 4. Perform semantic search
curl -X PUT "$BASE_URL/files/embeddings/search" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"query\": \"How do I install on Windows?\",
\"partitionId\": \"$PARTITION_ID\"
}" | jq '.'
Best Practices
Partition Organization
- Group related documents: Keep documents that should be searched together in the same partition
- Use descriptive names: “Customer Support KB” is better than “Partition 1”
- Consider access patterns: Create separate partitions for different user groups or use cases
File Preparation
- Keep files under 10 MB: Split very large documents if needed
- Use text-rich formats: PDF, Markdown, and plain text work best
- Include metadata: Descriptive filenames help with result interpretation
Search Optimization
- Use natural language queries: “How do I configure authentication?” works better than “authentication configuration”
- Filter by score: Results with scores below 0.5 are often not relevant
- Limit result count: Top 3-5 results usually contain the best matches
RAG Integration
- Always cite sources: Include document names in generated responses
- Set confidence thresholds: Don’t present low-confidence answers as authoritative
- Handle no-results gracefully: Tell users when documentation doesn’t cover their question
Troubleshooting
Files Not Becoming Searchable
Symptoms: isVectorGenerated stays false
Solutions:
- Check file format is supported
- Verify file is under 10 MB size limit
- Check the file contains extractable text (not scanned images)
- Wait longer for large documents (up to 20 seconds for very large files)
Low Search Relevance
Symptoms: Search returns unrelated documents or low scores
Solutions:
- Rephrase queries using natural language
- Ensure documents contain the information you’re searching for
- Check that documents have finished processing (
isVectorGenerated: true) - Try more specific queries
Upload Errors
Symptoms: 400 Bad Request or validation errors
Solutions:
- Verify partition ID is valid and exists
- Check file format is in the supported list
- Ensure file is under 10 MB
- Confirm Content-Type header matches file format
Authentication Issues
Symptoms: 401 Unauthorized or 403 Forbidden
Solutions:
- Verify API key or token is valid and not expired
- Check you have the required role (
Tenant.AdminorTenant.FileIngestion) - Ensure the partition belongs to your tenant
Related Documentation
- Files API Reference - Complete API documentation
- SDK Reference - Full SDK documentation
- Shell Reference - Shell commands for file management
- Core Concepts - RAG Pipeline - Architecture details