Advanced

Knowledge Base Indexer (Azure AI Search)

When a document is added or updated in a SharePoint knowledge library, the flow extracts its text, builds a search document (title, content, tags, metadata), and pushes it to an Azure AI Search index via the index API. Deletes the index entry when the source file is removed, keeping an enterprise search index continuously in sync with SharePoint.

Azure AI SearchSharePointMicrosoft Dataverse

Unique name

FlowLibsKnowledgeBaseIndexerAzureAISearch

Publisher

FlowLibs (flowlibs)

Version

1.0.0.0

Components

3 env vars + 1 cloud flow

Request access

Provided as-is, without warranty of any kind. Review and test each pattern in a non-production environment before deploying it to live automations. See our Terms.

Report an issue with this flow

What it does

Overview

This solution keeps an Azure AI Search index continuously in sync with a SharePoint knowledge library. It ships as two cloud flows + one Dataverse sync ledger in a single solution: 1. Knowledge Base Indexer (event-driven) — when a document is created or modified in the library, it extracts the text, builds a search document, and upserts it into the index (IndexDocuments, @search.action = mergeOrUpload), recording the document in the ledger. 2. Knowledge Base Index Reconciliation Sweep (scheduled) — every 6 hours it compares the library against the ledger and deletes orphaned documents from the index (DeleteDocuments) when their source file has been removed.

Why two flows? SharePoint has no reliable "file deleted" trigger, so create/update is handled in real time by the event flow while deletion is handled by a periodic reconciliation sweep. Together they give an always-current index that can power an app, bot, or RAG pipeline. Both flows are connector-first and ship Off.

Implementation note: the correct connected connector is shared_azureaisearch (Azure AI Search). The index-write operations (IndexDocuments / DeleteDocuments) are used here for the first time in FlowLibs.

Why you'd use it

Use Case

A team building a chatbot or enterprise search experience over their document corpus needs the AI Search index updated as soon as a document changes in SharePoint — not on the next built-in indexer cron — and needs deletions to flow through so stale results never surface.

Step-by-step

Flow Architecture

When a file is created or modified

SharePoint — GetOnUpdatedFileItems

Properties-only, polled every 5 min, splitOn one run per changed file.

Initialize correlation id + search key

Initialize Variable

Mint @guid() for tracing and build the stable index key sp-{SharePoint ID}.

Get File Content + extract text

SharePoint — GetFileContent + Compose

Download the document bytes, base64ToString decode, and cap the text to 32,000 chars.

Compose Search Document

Compose

Map to the index schema with @search.action = mergeOrUpload.

Index Document

Azure AI Search — IndexDocuments

Upsert the document into the index.

Upsert Ledger Row

Microsoft Dataverse — ListRecords + CreateRecord/UpdateRecord

Look up the document in the sync ledger and create or update the ledger row.

Reconciliation Sweep (Recurrence, 6h)

Recurrence

Second flow: every 6 hours, initialize a correlation id and deleted count.

List Current Files + Indexed Ledger

SharePoint — GetFileItems + Microsoft Dataverse — ListRecords

List every file in the library and load ledger rows where status = Indexed.

Delete Orphaned Documents

Azure AI Search — DeleteDocuments + Microsoft Dataverse — UpdateRecord

For each ledger row whose file is gone, delete the index document, mark the ledger row Deleted, and tally the removal.

Solution config

Environment Variables

Schema name	Type	Default	Description
flowlibs_SharePointSiteURL	String	https://your-tenant.sharepoint.com	Knowledge-library site (reused).
flowlibs_AzureSearchIndexName	String	kb-index	Target AI Search index name — set per environment.
flowlibs_KbLibraryName	String	Knowledge Base	Document library that holds the KB articles.

Auth dependencies

Connectors & Connections

Connector	API name	Actions used
Azure AI Search	shared_azureaisearch	IndexDocuments DeleteDocuments
SharePoint	shared_sharepointonline	GetOnUpdatedFileItems GetFileItems GetFileContent
Microsoft Dataverse	shared_commondataserviceforapps	ListRecords CreateRecord UpdateRecord

Note — All connections are referenced as solution connection references; the flow is portable between environments as long as a connection is mapped at import time.

Tweaks & variations

Customization Guide

Almost every realistic variant of this flow can be implemented by changing environment variable values. A few cases require small edits inside the flow definition — those are called out explicitly below.

Vector / RAG search: Add an Azure OpenAI embeddings call before Index Document and push the vector into a Collection(Edm.Single) field for semantic search.
Binary documents: base64ToString extracts text from text/markdown/HTML. For PDF/Word, insert a text-extraction step (Azure AI Document Intelligence prebuilt-read / prebuilt-layout) before Compose Content.
Chunking: Split large documents into chunks and index each as a separate doc with a parent id (extend the search key, e.g. sp-42-chunk-3).
Indexer frequency: Tune the event trigger's polling interval (5 min) and the sweep's recurrence (6 h) to your freshness vs. cost needs.
Real-time deletes: If your environment exposes a reliable delete signal, replace the sweep with an event-driven delete flow keyed on the same ledger.

Helpers & literals

Key Expressions

The flow is intentionally light on Power Fx / WDL gymnastics — the heaviest expressions are the branch-name concatenation and the approval outcome check. They are listed below in the order they appear in the flow.

EXPR.01Stable index key

Build a stable per-document index key.

workflow definition language

@concat('sp-', string(triggerOutputs()?['body/ID']))

EXPR.02Text extraction (cap 32k)

Cap the extracted text to 32,000 chars for the search field.

workflow definition language

@if(greater(length(string(outputs('Compose_Raw_Text'))),32000),substring(string(outputs('Compose_Raw_Text')),0,32000),string(outputs('Compose_Raw_Text')))

EXPR.03Upsert batch (documentToIndex)

One-item upsert batch for IndexDocuments.

json

[{ "@search.action":"mergeOrUpload", "id": ..., "title": ..., "content": ..., "url": ..., "library": ..., "modified": ..., "tags": @createArray(parameters('EnvVar_KbLibraryName')) }]

EXPR.04Delete batch (documentsToDelete)

One-item delete batch for DeleteDocuments.

json

[{ "id": "@{items('For_Each_Ledger_Row')?['flowlibs_name']}" }]

EXPR.05Still-present test (sweep)

Over the current-files list; an empty result means the file was deleted.

workflow definition language

@equals(string(item()?['ID']), items('For_Each_Ledger_Row')?['flowlibs_sharepointitemid'])

Make it yours

Customize & download

Generate a ready-to-import copy of this solution with your environment-variable values baked in — available on Base, Pro, or Team.

Upgrade to customize

Knowledge Base Indexer (Azure AI Search)

Overview

Use Case

Flow Architecture

When a file is created or modified

Initialize correlation id + search key

Get File Content + extract text

Compose Search Document

Index Document

Upsert Ledger Row

Reconciliation Sweep (Recurrence, 6h)

List Current Files + Indexed Ledger

Delete Orphaned Documents

Environment Variables

Connectors & Connections

Customization Guide

Key Expressions

EXPR.01Stable index key

EXPR.02Text extraction (cap 32k)

EXPR.03Upsert batch (documentToIndex)

EXPR.04Delete batch (documentsToDelete)

EXPR.05Still-present test (sweep)

Customize & download

Compose Run Summary

Comments