How To Search Text In WPF FlowDocument?

|

This blog article is a reply to the recent WPF MSDN forum thread on how to efficiently search text in FlowDocument. The thread starter needs to have the same performance as the search feature in Visual Studio text editor. I don't know how Visual Studio IDE implements the search feature, but in terms of search in FlowDocument, because FlowDocument enables much richer content model, It's presumably much harder to achieve the same search performance as Visual Studio text editor.

I have to say that the code I posted in that thread apparently has a serious performance flaw, it introduces a lot of unnecessary iterations. After digging into this issue at the weekend, I finally come up with a method which can achieve the perceived performance, and I think this should be enough at most circumstance. Based on this method, I mocked up a sample code which shows how to perform find and replace feature in FlowDocument, because find and replace is a common feature every text editing tool should provide, this might help others who need this similar feature. The following shows the core code which perform the search:

/// <summary>
///
Find the corresponding<see cref="TextRange"/> instance
/// representing the input string given a specified text pointer position.
/// </summary>
///
<param name="position">the current text position</param>
///
<param name="textToFind">input text</param>
///
<param name="findOptions">the search option</param>
///
<returns>An<see cref="TextRange"/> instance represeneting the matching string withing the text container.</returns>
public TextRange GetTextRangeFromPosition(ref TextPointer position, String input, FindOptions findOptions)
{
    Boolean matchCase = (findOptions & FindOptions.MatchCase) == FindOptions.MatchCase;
    Boolean matchWholeWord = (findOptions & FindOptions.MatchWholeWord) == FindOptions.MatchWholeWord;

    TextRange textRange = null;

    while (position != null)
    {
        if (position.CompareTo(inputDocument.ContentEnd) == 0)
        {
            break;
        }

        if (position.GetPointerContext(LogicalDirection.Forward) == TextPointerContext.Text)
        {
            String textRun = position.GetTextInRun(LogicalDirection.Forward);
            StringComparison stringComparison = matchCase ? StringComparison.CurrentCulture : StringComparison.CurrentCultureIgnoreCase;
            Int32 indexInRun = textRun.IndexOf(input, stringComparison);

            if (indexInRun >= 0)
            {
                position = position.GetPositionAtOffset(indexInRun);
                TextPointer nextPointer = position.GetPositionAtOffset(input.Length);
                textRange = new TextRange(position, nextPointer);

                if (matchWholeWord)
                {
                    if (IsWholeWord(textRange)) // Test if the "textRange" represents a word.
                    {
                        // If a WholeWord match is found, directly terminate the loop.
                        break;
                    }
                    else
                    {
                        // If a WholeWord match is not found, go to next recursion to find it.
                        position = position.GetPositionAtOffset(input.Length);
                        return GetTextRangeFromPosition(ref position, input, findOptions);
                    }
                }
                else
                {
                    // If a none-WholeWord match is found, directly terminate the loop.
                    position = position.GetPositionAtOffset(input.Length);
                    break;
                }
            }
            else
            {
                // If a match is not found, go over to the next context position after the "textRun".
                position = position.GetPositionAtOffset(textRun.Length);
            }
        }
        else
        {
            //If the current position doesn't represent a text context position, go to the next context position.
            // This can effectively ignore the formatting or embedded element symbols.
            position = position.GetNextContextPosition(LogicalDirection.Forward);
        }
    }

    return textRange;
}

The code above is part of my FindAndReplaceManager helper class implementation, you can refer to the attachment for the complete source code. The code should be pretty straightforward as I've commentted it. The FindAndReplaceManager can support search options such as FindOptions.MatchCase and FindOptions.MatchWholeWord, aka two commonly used search options. For simplicity, I don't implement reverse search, since this should be really straightforward, instead of using LogicalDirection.Forward, you could use LogicalDirection.Backward.

As I've said, the FindAndReplaceManager should be able to achieve perceived performance at most situation, if you need hard best performance. You'd better choose a more sophisticated search algorithm instead of the bare-bones "start-to-end" search algorithm as is illustrated in the code above.

Another alternative you could choose is the internal undocumented search API provided by WPF. The System.Windows.Documents.TextFindEngine class has a static "Find" method, this method is widely used in build-in document readers and viewers such as FlowDocumentReader, FlowDocumentPageViewer, and FlowDocumentScrollViewer. Because TextFindEngine has a much better understanding of the underlying document content structure, it should provide the hard performance benefit you expect. The following helper method shows how to use this method:

using System;
using System.Windows;
using System.Reflection;
using System.Globalization;
using System.Windows.Documents;

namespace Sheva.Windows.Documents
{
    [Flags]
    public enum FindFlags
    {
        FindInReverse = 2,
        FindWholeWordsOnly = 4,
        MatchAlefHamza = 0x20,
        MatchCase = 1,
        MatchDiacritics = 8,
        MatchKashida = 0x10,
        None = 0
    }

    public static class DocumentHelper
    {
        private static MethodInfo findMethod = null;

        public static TextRange FindText(TextPointer findContainerStartPosition,TextPointer findContainerEndPosition, String input, FindFlags flags, CultureInfo cultureInfo)
        {
            TextRange textRange = null;
            if (findContainerStartPosition.CompareTo(findContainerEndPosition) < 0)
            {
                try
                {
                    if (findMethod == null)
                    {
                        findMethod = typeof(FrameworkElement).Assembly.GetType("System.Windows.Documents.TextFindEngine").
                               GetMethod("Find", BindingFlags.Static | BindingFlags.Public);
                    }
                    Object result = findMethod.Invoke(null, new Object[] { findContainerStartPosition,
                    findContainerEndPosition,
                    input, flags, CultureInfo.CurrentCulture });
                    textRange = result as TextRange;
                }
                catch (ApplicationException)
                {
                    textRange = null;
                }
            }

            return textRange;
        }
    }
}

Because TextFindEngine.Find() is a non-public API, we should use a bit of reflection code to call it. If you are working on pesonal project, feel free to use it as an alternative, but never ever use this method in production code.

WPF should provide a much better built-in public API to perform search operation in FlowDocument. I don't know what type of future plan WPF team has, but from my educated guess, WPF should have a much better support on this in the near future.

Attachment: SearchInFlowDocumentDemo.zip

10 comments:

Kira said...

Great! Good technology article! 收藏!

Igor said...

Very useful

Sam said...

God bless you!!!
If I had only seen this article 6 hours ago!

Gert said...

I Can't access the attachment. A login page is displayed when clicking the attachment link.

smehrozalam said...

Thanks for this great work. Really helpful.

Anonymous said...

Would not work if text is divided across multiple runs.
Example:
You wouldn't be able to find the word Microsoft in the following structure.

Microsoft

Anonymous said...

Would not work if text is divided across multiple runs.
Example:
You wouldn't be able to find the word Microsoft in the following structure.

<Run>Micro</Run><Run>soft</Run>

Renuka said...

Really its a grt article....can you please show how to search previous text here?

JP Chow said...

WTF. Why isn't the TextFindEngine just a normal publicly accessible API? It's so damn fast and useful?!

Jim Wright said...

Our open source project uses a similar technique, if you need a filled out solution, take a look https://github.com/keyoti/RapidFindReplaceWPF