How To Search Text In WPF FlowDocument?

| 11 comments

This blog article is a reply to the recent WPF MSDN forum thread on how to efficiently search text in FlowDocument. The thread starter needs to have the same performance as the search feature in Visual Studio text editor. I don't know how Visual Studio IDE implements the search feature, but in terms of search in FlowDocument, because FlowDocument enables much richer content model, It's presumably much harder to achieve the same search performance as Visual Studio text editor.

I have to say that the code I posted in that thread apparently has a serious performance flaw, it introduces a lot of unnecessary iterations. After digging into this issue at the weekend, I finally come up with a method which can achieve the perceived performance, and I think this should be enough at most circumstance. Based on this method, I mocked up a sample code which shows how to perform find and replace feature in FlowDocument, because find and replace is a common feature every text editing tool should provide, this might help others who need this similar feature. The following shows the core code which perform the search:

/// <summary>
///
Find the corresponding<see cref="TextRange"/> instance
/// representing the input string given a specified text pointer position.
/// </summary>
///
<param name="position">the current text position</param>
///
<param name="textToFind">input text</param>
///
<param name="findOptions">the search option</param>
///
<returns>An<see cref="TextRange"/> instance represeneting the matching string withing the text container.</returns>
public TextRange GetTextRangeFromPosition(ref TextPointer position, String input, FindOptions findOptions)
{
    Boolean matchCase = (findOptions & FindOptions.MatchCase) == FindOptions.MatchCase;
    Boolean matchWholeWord = (findOptions & FindOptions.MatchWholeWord) == FindOptions.MatchWholeWord;

    TextRange textRange = null;

    while (position != null)
    {
        if (position.CompareTo(inputDocument.ContentEnd) == 0)
        {
            break;
        }

        if (position.GetPointerContext(LogicalDirection.Forward) == TextPointerContext.Text)
        {
            String textRun = position.GetTextInRun(LogicalDirection.Forward);
            StringComparison stringComparison = matchCase ? StringComparison.CurrentCulture : StringComparison.CurrentCultureIgnoreCase;
            Int32 indexInRun = textRun.IndexOf(input, stringComparison);

            if (indexInRun >= 0)
            {
                position = position.GetPositionAtOffset(indexInRun);
                TextPointer nextPointer = position.GetPositionAtOffset(input.Length);
                textRange = new TextRange(position, nextPointer);

                if (matchWholeWord)
                {
                    if (IsWholeWord(textRange)) // Test if the "textRange" represents a word.
                    {
                        // If a WholeWord match is found, directly terminate the loop.
                        break;
                    }
                    else
                    {
                        // If a WholeWord match is not found, go to next recursion to find it.
                        position = position.GetPositionAtOffset(input.Length);
                        return GetTextRangeFromPosition(ref position, input, findOptions);
                    }
                }
                else
                {
                    // If a none-WholeWord match is found, directly terminate the loop.
                    position = position.GetPositionAtOffset(input.Length);
                    break;
                }
            }
            else
            {
                // If a match is not found, go over to the next context position after the "textRun".
                position = position.GetPositionAtOffset(textRun.Length);
            }
        }
        else
        {
            //If the current position doesn't represent a text context position, go to the next context position.
            // This can effectively ignore the formatting or embedded element symbols.
            position = position.GetNextContextPosition(LogicalDirection.Forward);
        }
    }

    return textRange;
}

The code above is part of my FindAndReplaceManager helper class implementation, you can refer to the attachment for the complete source code. The code should be pretty straightforward as I've commentted it. The FindAndReplaceManager can support search options such as FindOptions.MatchCase and FindOptions.MatchWholeWord, aka two commonly used search options. For simplicity, I don't implement reverse search, since this should be really straightforward, instead of using LogicalDirection.Forward, you could use LogicalDirection.Backward.

As I've said, the FindAndReplaceManager should be able to achieve perceived performance at most situation, if you need hard best performance. You'd better choose a more sophisticated search algorithm instead of the bare-bones "start-to-end" search algorithm as is illustrated in the code above.

Another alternative you could choose is the internal undocumented search API provided by WPF. The System.Windows.Documents.TextFindEngine class has a static "Find" method, this method is widely used in build-in document readers and viewers such as FlowDocumentReader, FlowDocumentPageViewer, and FlowDocumentScrollViewer. Because TextFindEngine has a much better understanding of the underlying document content structure, it should provide the hard performance benefit you expect. The following helper method shows how to use this method:

using System;
using System.Windows;
using System.Reflection;
using System.Globalization;
using System.Windows.Documents;

namespace Sheva.Windows.Documents
{
    [Flags]
    public enum FindFlags
    {
        FindInReverse = 2,
        FindWholeWordsOnly = 4,
        MatchAlefHamza = 0x20,
        MatchCase = 1,
        MatchDiacritics = 8,
        MatchKashida = 0x10,
        None = 0
    }

    public static class DocumentHelper
    {
        private static MethodInfo findMethod = null;

        public static TextRange FindText(TextPointer findContainerStartPosition,TextPointer findContainerEndPosition, String input, FindFlags flags, CultureInfo cultureInfo)
        {
            TextRange textRange = null;
            if (findContainerStartPosition.CompareTo(findContainerEndPosition) < 0)
            {
                try
                {
                    if (findMethod == null)
                    {
                        findMethod = typeof(FrameworkElement).Assembly.GetType("System.Windows.Documents.TextFindEngine").
                               GetMethod("Find", BindingFlags.Static | BindingFlags.Public);
                    }
                    Object result = findMethod.Invoke(null, new Object[] { findContainerStartPosition,
                    findContainerEndPosition,
                    input, flags, CultureInfo.CurrentCulture });
                    textRange = result as TextRange;
                }
                catch (ApplicationException)
                {
                    textRange = null;
                }
            }

            return textRange;
        }
    }
}

Because TextFindEngine.Find() is a non-public API, we should use a bit of reflection code to call it. If you are working on pesonal project, feel free to use it as an alternative, but never ever use this method in production code.

WPF should provide a much better built-in public API to perform search operation in FlowDocument. I don't know what type of future plan WPF team has, but from my educated guess, WPF should have a much better support on this in the near future.

Attachment: SearchInFlowDocumentDemo.zip

DataErrorValidationRule - New Way To Invalidate Data In WPF

| 1 comments

The WPF 3.5 has introduced a new data validation API aka DataErrorValidationRule, which you can specify on the Binding object, if the data source implements the IDataErrorInfo interface. This new feature enables some of the scenario the previous Custom ValidationRule cannot enable.

One of the scenario the custom ValidationRule cannot support is to enable data binding on the properties of the Custom ValidationRule class. Because ValidationRule is not a DependencyOject, you cannot define dependency properties on it to enable data binding. And because ValidationRule is not a part of the element tree, any Binding expression which relies on RelativeSource or ElementName or similar things that needs to walk up the element tree to find the binding source cannot be evaluated successfully. Some community members such as Josh Smith has come up with a hackery to workaround this limitation using the trick he calls "Virutal Branches". Or our beloved Dr. WPF's ObjectReference custom Markup Extension as he "bloated" in this MSDN WPF thread.

DataErrorValidationRule drives those "dirty tricks" to obsolete. Because You don't need to bind some values to custom ValidationRule object as validation input parameters. Because when implementing IDataErrorInfo, you do the validation logic at the source object. The following is an example of how to leverage the DataErrorValidationRule API to suppport the type of scenario "Virtual Branches" is trying to enable.

First off, the data source class should implement IDataErrorInfo or INotifyPropertyChanged if you want to enable two way data binding as follows:

public class Person : IDataErrorInfo, INotifyPropertyChanged
{
    private int age;
    private int min = 0;
    private int max = 150;

    public int Age
    {
        get { return age; }
        set
        {
            age = value;
            RaisePropertyChanged("Age");
        }
    }

    public string Error
    {
        get
        {
            return null;
        }
    }

    public int Min
    {
        get
        {
            return min;
        }
        set
        {
            min = value;
            RaisePropertyChanged("Min");
        }
    }

    public int Max
    {
        get
        {
            return max;
        }
        set
        {
            max = value;
            RaisePropertyChanged("Max");
        }
    }

    public string this[string name]
    {
        get
        {
            string result = null;

            if (name == "Age")
            {
                if (this.age < this.Min || this.age > this.Max)
                {
                    result = String.Format("Age must not be less than {0} or greater than {1}.", this.Min, this.Max);
                }
            }
            return result;
        }
    }

    public event PropertyChangedEventHandler PropertyChanged;

    private void RaisePropertyChanged(string propertyName)
    {
        PropertyChangedEventHandler handler = PropertyChanged;
        if (handler != null)
        {
            handler(this, new PropertyChangedEventArgs(propertyName));
        }
    }
}

You can see that Min and Max properties needs to be specified by the user, so we need to bind those two properties to corresponding UI elements as following XAML snippet shows:

<Window x:Class="BusinessLayerValidation.Window1"
       xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
       xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
       Title="WPF IDataErrorInfo Sample"
       Width="450" Height="170"
       xmlns:src="clr-namespace:BusinessLayerValidation">

  <
Window.Resources>
    <
src:Person x:Key="data"/>
    <
Style x:Key="textBoxInError" TargetType="TextBox">
      <
Style.Triggers>
        <
Trigger Property="Validation.HasError" Value="true">
          <
Setter Property="ToolTip"
                 Value="{Binding RelativeSource={x:Static RelativeSource.Self},
                        Path=(Validation.Errors)[0].ErrorContent}
"/>
        </
Trigger>
      </
Style.Triggers>
    </
Style>
  </
Window.Resources>

  <
StackPanel Margin="20" DataContext="{Binding Source={StaticResource data}}">
    <
StackPanel Orientation="Horizontal">
      <
TextBlock Width="60">
        Min:(<TextBlock Text="{Binding Path=Value, ElementName=minSlider}"/>)
      </TextBlock>
      <
Slider Margin="10, 0, 0, 0"
              Name="minSlider"
              Width="300"
              Orientation="Horizontal"
              IsSnapToTickEnabled="True"
              HorizontalAlignment="Right"
              TickPlacement="BottomRight"
              AutoToolTipPlacement="BottomRight"
              Value="{Binding Path=Min, Mode=TwoWay}"
              Minimum="0"
              Maximum="150"
              TickFrequency="10"/>
    </
StackPanel>
    <
StackPanel Orientation="Horizontal">
      <
TextBlock Width="60">
        Max:(<TextBlock Text="{Binding Path=Value, ElementName=maxSlider}"/>)
      </TextBlock>
      <
Slider Margin="10, 0, 0, 0"
              Name="maxSlider"
              Width="300"
              Orientation="Horizontal"
              IsSnapToTickEnabled="True"
              HorizontalAlignment="Right"
              TickPlacement="BottomRight"
              AutoToolTipPlacement="BottomRight"
              Value="{Binding Path=Max, Mode=TwoWay}"
              Minimum="0"
              Maximum="150"
              TickFrequency="10"/>
    </
StackPanel>
    <
TextBlock>Enter your age:</TextBlock>
    <
TextBox Style="{StaticResource textBoxInError}" Name="textBox">
      <
TextBox.Text>
        <
Binding Path="Age"
                ValidatesOnDataErrors="True"
                UpdateSourceTrigger="PropertyChanged">
          <
Binding.ValidationRules>
            <
ExceptionValidationRule/>
          </
Binding.ValidationRules>
        </
Binding>
      </
TextBox.Text>
    </
TextBox>
  </
StackPanel>
</
Window>

You can see from the XAML shown above that DataErrorValidationRule actually provide a greater flexibility when validating data. For a detailed introduction to DataErrorValidationRule, and its role in the WPF data validation model, you can refer to this WPF SDK blog article.

For completeness, I've attached full sample project here for further reference.

Attachment: WPFDataValidation.zip

WCF Trip - What Happens To BeginInvoke

| 1 comments

Recently I came across Nicholas Allen's blog post talking about how BeginInvoke breaks when used against proxies generated by WCF client runtime, specifically the ChannelFactory. and his conclusion to the misbehaviour exposed by BeginInvoke right here is something like this (quoted from the original article):

The problem is that BeginInvoke knows about and only works with specific types of proxy objects, which do not include the proxy objects generated by ChannelFactory.

Actually Nicholas Allen's reasoning here is kinda like a "technical correct but lack of detailed explanation" statement, if you write something like the following, no one can imagine that you are actually doing something wrong:

String uri = "net.tcp://localhost:2222/Services";
ChannelFactory<IEchoService> factory = new ChannelFactory<IEchoService>(new NetTcpBinding(), uri);
IEchoService proxy = factory.CreateChannel();
EchoDelegate d = new EchoDelegate(proxy.Echo);
IAsyncResult result = d.BeginInvoke("foo", new AsyncCallback(Callback), null);

So what really happens here?

Let's first add some piece of code into the original testing code to check some of presumptions I make on the proxy generated by ChannelFactory:

Console.WriteLine(System.Runtime.Remoting.RemotingServices.IsTransparentProxy(proxy));
Console.WriteLine(System.Runtime.Remoting.RemotingServices.GetRealProxy(proxy).GetType());
Console.WriteLine(result.IsCompleted);

If running the modified code, you will find some of the interesting bits:

  1. The proxy generated by WCF client runtime aka ChannelFactory is actually a TransparentProxy;
  2. The RealProxy paired with this TransparentProxy is a System.ServiceModel.Channels.ServiceChannelProxy implementation;
  3. When beginInvoking against a proxy generated by ChannelFactory, the call is performed as ordinary synchronous method invocation.

So what's the happening here? How does WCF relate to the remoting architecture such as TransparentProxy and RealProxy metaphors? and specifically why does BeginInvoke break here?

To answer those questions, let's first take on the first question, and digg into it, I've spent several hours to examine the implementation of WCF ChannelFactory implementation, one of the greatest discovery I find is that before kicking off the channel stack to process the service request, WCF client runtime will actually intercept every WCF service call by injecting a TransparentProxy and ServiceChannelProxy between WCF service call site and underlying the channel stack. The reason WCF implements the client runtime the way it is is that WCF needs to differ between normal method invocation and WCF service invocation on service proxy objects. how about if you call GetType() on the proxy generated by the ChannelFactory, what type do you expect the GetType() method will return? If you write the code to test, you will get stunned by realizing that GetType() will actually return IEchoService, WTF? How does GetType() method return the interface type object rather than the concrete type object? Actually, WCF has been intercepted the call to GetType(), and revamped it to return the underlying proxied type, thus hiding the real proxy implementation for IEchoService. Another reason WCF intercepts every method call is to differ between synchronous service calls and asynchronous service calls. Imagine you have a WCF service contract like this:

[ServiceContract]
public interface IEchoService
{
    [OperationContract]
    String Echo(String text);
}

   If you run the svcutil.exe tool to generate client side proxy implementation for async call just as Nicholas Allen suggested in his original article:

   svcutil /language:C# /config:App.config /async net.tcp://localhost:2222/Services

   you will get something like this:

[System.CodeDom.Compiler.GeneratedCodeAttribute("System.ServiceModel", "3.0.0.0")]
[System.ServiceModel.ServiceContractAttribute(ConfigurationName="IEchoService")]
public interface IEchoService
{
   
    [System.ServiceModel.OperationContract]
    string Echo(string text);
   
    [System.ServiceModel.OperationContract]
    System.IAsyncResult BeginEcho(string text, System.AsyncCallback callback, object asyncState);
   
    string EndEcho(System.IAsyncResult result);
}

[System.CodeDom.Compiler.GeneratedCodeAttribute("System.ServiceModel", "3.0.0.0")]
public interface IEchoServiceChannel : IEchoService, System.ServiceModel.IClientChannel
{
}

[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.CodeDom.Compiler.GeneratedCodeAttribute("System.ServiceModel", "3.0.0.0")]
public partial class EchoServiceClient : System.ServiceModel.ClientBase<IEchoService>, IEchoService
{
   
    public EchoServiceClient()
    {
    }
   
    public EchoServiceClient(string endpointConfigurationName) : base(endpointConfigurationName)
    {
    }
   
    public EchoServiceClient(string endpointConfigurationName, string remoteAddress) :
            base(endpointConfigurationName, remoteAddress)
    {
    }
   
    public EchoServiceClient(string endpointConfigurationName, System.ServiceModel.EndpointAddress remoteAddress) : base(endpointConfigurationName, remoteAddress)
    {
    }
   
    public EchoServiceClient(System.ServiceModel.Channels.Binding binding, System.ServiceModel.EndpointAddress    remoteAddress) : base(binding, remoteAddress)
    {
    }
   
    public string Echo(string text)
    {
        return base.Channel.Echo(text);
    }
   
    public System.IAsyncResult BeginEcho(string text, System.AsyncCallback callback, object asyncState)
    {
        return base.Channel.BeginEcho(text, callback, asyncState);
    }
   
    public string EndEcho(System.IAsyncResult result)
    {
        return base.Channel.EndEcho(result);
    }
}

From the above code, we find that WCF follows .NET's asynchronous method invocation pattern quite closely by pairing each service operation call XX with a BeginXX and EndXX async call implementation. the code shown above is really clear and standard, but how does it work out actually? How does a BeginXX call will be performed asynchronously? and how does a EndXX kicks in here to finalize the asynchronous service invocation?

In order to let the BeginXX and EndXX work as their signature indicate, WCF actually needs to know which service invocation is going to be performed through BeginXX or EndXX calls, to put it another way, WCF needs to know if the BeginXX or EndXX has been called, so it will perform its underlying plumbing to do the magic, in order to get those invocation infomation, WCF needs to have the capability to fine-grained control over the invocation of the service operations exposed by the service contracts. Since TransparentProxy and RealProxy mechanism which is heavily used by the .NET remoting has already haven this capability directly built into the CLR, WCF can leverage this infrastructure to intercept the method calls, and perform its underlying async plumbing at the channel level according to method you are going to invoke.

Right now, I am almost finishing answering the first question - How does WCF relate to the remoting architecture such as TransparentProxy and RealProxy metaphors? but how about the second question I raised myself, why does BeginInvoke break when all the TP and RP plumbing is in place? This question is much trickier than it seems to be, after a bit of research on the default implementation of TP and RP mechanism used by .NET remoting using both  .NET reflector and the rotor 2.0 implementation of CLR, I finally figure out that for the current implementation of TP and RP mechanism, it only supports asynchronous call when the default RemotingProxy is in place, since the ServiceChannelProxy is WCF's own implementation, it gets ignored by the BeginInvoke mechanism, and the BeginInvoke call against WCF's proxies will be performed synchronously.

Up until now, all the puzzles has been demystified. BeginInvoke is probably one of the most confusing APIs in the .NET framework as this article and my previous article demonstrates:)