Knowledge Base Indexer (Azure AI Search)
When a document is added or updated in a SharePoint knowledge library, the flow extracts its text, builds a search document (title, content, tags, metadata), and pushes it to an Azure AI Search index via the index API. Deletes the index entry when the source file is removed, keeping an enterprise search index continuously in sync with SharePoint.
Provided as-is, without warranty of any kind. Review and test each pattern in a non-production environment before deploying it to live automations. See our Terms.
Overview
This solution keeps an Azure AI Search index continuously in sync with a SharePoint knowledge library. It ships as two cloud flows + one Dataverse sync ledger in a single solution: 1. Knowledge Base Indexer (event-driven) — when a document is created or modified in the library, it extracts the text, builds a search document, and upserts it into the index (IndexDocuments, @search.action = mergeOrUpload), recording the document in the ledger. 2. Knowledge Base Index Reconciliation Sweep (scheduled) — every 6 hours it compares the library against the ledger and deletes orphaned documents from the index (DeleteDocuments) when their source file has been removed.
Why two flows? SharePoint has no reliable "file deleted" trigger, so create/update is handled in real time by the event flow while deletion is handled by a periodic reconciliation sweep. Together they give an always-current index that can power an app, bot, or RAG pipeline. Both flows are connector-first and ship Off.
Implementation note: the correct connected connector is shared_azureaisearch (Azure AI Search). The index-write operations (IndexDocuments / DeleteDocuments) are used here for the first time in FlowLibs.
Use Case
A team building a chatbot or enterprise search experience over their document corpus needs the AI Search index updated as soon as a document changes in SharePoint — not on the next built-in indexer cron — and needs deletions to flow through so stale results never surface.
Flow Architecture
When a file is created or modified
SharePoint — GetOnUpdatedFileItemsProperties-only, polled every 5 min, splitOn one run per changed file.
Initialize correlation id + search key
Initialize VariableMint @guid() for tracing and build the stable index key sp-{SharePoint ID}.
Get File Content + extract text
SharePoint — GetFileContent + ComposeDownload the document bytes, base64ToString decode, and cap the text to 32,000 chars.
Compose Search Document
ComposeMap to the index schema with @search.action = mergeOrUpload.
Index Document
Azure AI Search — IndexDocumentsUpsert the document into the index.
Upsert Ledger Row
Microsoft Dataverse — ListRecords + CreateRecord/UpdateRecordLook up the document in the sync ledger and create or update the ledger row.
Reconciliation Sweep (Recurrence, 6h)
RecurrenceSecond flow: every 6 hours, initialize a correlation id and deleted count.
List Current Files + Indexed Ledger
SharePoint — GetFileItems + Microsoft Dataverse — ListRecordsList every file in the library and load ledger rows where status = Indexed.
Delete Orphaned Documents
Azure AI Search — DeleteDocuments + Microsoft Dataverse — UpdateRecordFor each ledger row whose file is gone, delete the index document, mark the ledger row Deleted, and tally the removal.
Environment Variables
| Schema name | Type | Default | Description |
|---|---|---|---|
| flowlibs_SharePointSiteURL | String | https://your-tenant.sharepoint.com | Knowledge-library site (reused). |
| flowlibs_AzureSearchIndexName | String | kb-index | Target AI Search index name — set per environment. |
| flowlibs_KbLibraryName | String | Knowledge Base | Document library that holds the KB articles. |
Connectors & Connections
| Connector | API name | Actions used |
|---|---|---|
| Azure AI Search | shared_azureaisearch | IndexDocuments DeleteDocuments |
| SharePoint | shared_sharepointonline | GetOnUpdatedFileItems GetFileItems GetFileContent |
| Microsoft Dataverse | shared_commondataserviceforapps | ListRecords CreateRecord UpdateRecord |
Note — All connections are referenced as solution connection references; the flow is portable between environments as long as a connection is mapped at import time.
Customization Guide
Almost every realistic variant of this flow can be implemented by changing environment variable values. A few cases require small edits inside the flow definition — those are called out explicitly below.
- Vector / RAG search
- Add an Azure OpenAI embeddings call before Index Document and push the vector into a Collection(Edm.Single) field for semantic search.
- Binary documents
- base64ToString extracts text from text/markdown/HTML. For PDF/Word, insert a text-extraction step (Azure AI Document Intelligence prebuilt-read / prebuilt-layout) before Compose Content.
- Chunking
- Split large documents into chunks and index each as a separate doc with a parent id (extend the search key, e.g. sp-42-chunk-3).
- Indexer frequency
- Tune the event trigger's polling interval (5 min) and the sweep's recurrence (6 h) to your freshness vs. cost needs.
- Real-time deletes
- If your environment exposes a reliable delete signal, replace the sweep with an event-driven delete flow keyed on the same ledger.
Key Expressions
The flow is intentionally light on Power Fx / WDL gymnastics — the heaviest expressions are the branch-name concatenation and the approval outcome check. They are listed below in the order they appear in the flow.
EXPR.01Stable index key
Build a stable per-document index key.
EXPR.02Text extraction (cap 32k)
Cap the extracted text to 32,000 chars for the search field.
EXPR.03Upsert batch (documentToIndex)
One-item upsert batch for IndexDocuments.
EXPR.04Delete batch (documentsToDelete)
One-item delete batch for DeleteDocuments.
EXPR.05Still-present test (sweep)
Over the current-files list; an empty result means the file was deleted.
Customize & download
Generate a ready-to-import copy of this solution with your environment-variable values baked in — available on Base, Pro, or Team.
Upgrade to customize
Comments
Sign in to join the conversation.
Sign inNo comments yet. Be the first to share your experience with this flow.