Dissecting Razor, part 1: Parts of the framework

Razor involves two distinct components: The Razor engine and the WebPages framework.

The Razor engine, in System.Web.Razor.dll, parses CSHTML (and VBHTML) files into CodeDOM trees.  Except for the word Web in project name, the engine has nothing to do with ASP.Net; it doesn’t even reference System.Web.dll.  In fact, it targets the .Net Client Profile, and only references mscorlib and System.dll.

The Razor engine is aware of all of Razor’s syntax-level features (code nuggets, sections, helpers), but is not aware of what they mean; it blindly transforms them into function calls.

The Razor engine can be used without ASP.Net for any kind of templating system.  This is done by inheriting the RazorEngineHost class to provide default code-generation settings (such as base class and method name), then passing the host to a RazorTemplateParser. 

The standard syntax will be annoying when generating non-XML-like content.  To avoid this, one can write a custom MarkupParser to define a Razor syntax for a different markup language (such as CSS).


The WebPages framework, in System.Web.Webpages.dll, is a set of classes to use with the Razor parser.  It contains the WebPage class which standard Razor pages inherit. This class defines the methods which are blindly called by the Razor parser, such as DefineSection and WriteLiteral.    This framework also handles _PageStart and _AppStart pages and contains the HtmlHelper infrastructure.

The WebPages framework is not directly connected to the Razor parser.  It could theoretically be used with a different template engine, as long as the template engine emits the correct method calls and class definitions.  (more on this later)

The WebPages framework also contains two sets of utility methods.  System.Web.Helpers.dll contains miscellaneous utility classes, such as Crypto and WebMail wrappers, plus grid and chart implementations.  Microsoft.Web.Helpers.dll contains HTML helper classes which integrate with various third-party services, including Twitter, ReCaptcha, Google Analytics, and more.  Most of these helpers can also be used in ordinary ASPX pages.

The source code for all of these projects is available here.

Next time: Gluing it all together.

Modifying HTML strings using jQuery

jQuery makes it very easy to modify a DOM tree.  For example, to strip all hyperlinks (<a> tags) from an element, we can write (demo)

$(...).find('a[href]')
      .replaceWith(function() { return this.childNodes });

After getting used to this, one might want to use jQuery to modify HTML contained in a string.  Here, however, the naïve approach does not work:

var htmlSource = ...;
$(htmlSource).find('a[href]')
      .replaceWith(function() { return this.childNodes });

This code tries to remove all <a> tags from the HTML contained in the htmlSource string.  However, what it actually does is create a detached DOM tree containing the new elements, strip all <a> tags in those elements, and throw the whole thing away.  It doesn’t modify the original string.  In fact, since the  $ function only takes a reference to an immutable string, this approach cannot modify the original string.

Instead, you need to retrieve the source from the DOM tree after modifying it, then assign that source back to the variable. 

There is an additional subtlety with this approach.  jQuery cannot return the complete HTML source for a collection of elements.  Therefore, it is also necessary to wrap the HTML in a dummy element (typically a <div>).   One can then call .html() to get the innerHTML of the dummy element, which will contain exactly the desired content

This also eliminates the distinction between root-level elements and nested elements.  If the original HTML string contains root-level <a> elements (which aren’t nested in other tags), writing $(htmlSource).find('a') won’t find them, since .find() only searches the descendants of the elements in the jQuery object.  By wrapping the HTML in a dummy element, all of the elements in the original content become descendants, and can be returned by .find().

Here, therefore, is the correct way to modify an HTML string using jQuery:

var htmlSource = ...;
var tree = $("<div>" + htmlSource + "</div>");

tree.find('a[href]')
    .replaceWith(function() { return this.childNodes });

htmlSource = tree.html();

Generic Base Classes in ASP.Net MVC

Last time, we saw that there are severe limitations in creating ASPX pages which inherit generic base classes.  Many readers were probably wondering how ASP.Net MVC works around this limitation.  In ASP.Net MVC views, people write pages like this all the time:

<%@ Page Language="C#" 
    Inherits="ViewPage<IEnumerable<DataLayer.Product>>" %>

ASP.Net MVC includes its own workaround for these limitations.  The Web.config file in the Views folder of an ASP.Net MVC project registers a PageParserFilter:

<pages
    validateRequest="false"
    pageParserFilterType="System.Web.Mvc.ViewTypeParserFilter, System.Web.Mvc, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"
    ...>
    ...
</pages>

PageParserFilter is one of ASP.Net’s lesser-known extensibility points.  It can intercept different parts in the parsing process for an ASPX page and modify the page.  The MVC framework’s ViewTypeParserFilter will check whether the page’s inherits="" attribute contains ( or a < characters; these characters can only appear in C# or VB.Net generic types, but not in the CLR’s native syntax.

If the inherits="" attribute contains these characters, it will save the attribute’s original value, then replace it with ​ViewPage (Or ViewMasterPage or ViewUserControl, as appropriate).  This way, the rest of the built-in ASP.Net parser will see a normal type name that it knows how to parse. After the page finishes parsing, an internal ControlBuilder registered on MVC’s base types (ViewPage, ViewMasterPage or ViewUserControl) will replace the base type in the generated CodeDOM tree with the original value of the inherits="" attribute.

The one problem with the hack is that it leaves the ASPX parsing engine unaware of the page’s actual base type.  Therefore, if you make a page that inherits a generic base class with additional properties, you won’t be able to set those properties in the <%@ Page declaration (since the ASPX parser won’t know about them).  If you inherit a non-generic type, this mechanism will not kick in and page properties will work fine.

Generic Base Classes in ASP.Net

ASP.Net pages can inherit from custom classes (as long as they inherit System.Web.UI.Page).  This can be useful to add utility functions or shared (code-behind) behaviors to your pages.  (Note that you could also use Extension methods or HTTP modules)

However, if you try to inherit a generic base class, it won’t work:

public class DataPage<T> : Page {
    public T Data { get; set; }
}
<%@ Page Language="C#" Inherits="DataPage<string>" %>

This code results in a yellow screen of death, with the  parser error, Could not load type 'DataPage<string>'.

This happens because the ASP.Net page parser is unaware of C# generics syntax.  The familiar generics syntax (eg, List<int>) is actually a C# innovation and is not used at all in the actual framework.  The “native” generics syntax, which is used by Reflection, is markedly different: List`1[Int32] (namespaces omitted for brevity).  This name is returned by the Type.AssemblyQualifiedName property.

Since ASP.Net uses reflection APIs to load types, we need to specify generic types using CLR syntax (and with full namespaces).   Therefore, the following page will work:

<%@ Page Language="C#" 
    Inherits="TestSite.DataPage`1[[System.String, mscorlib]]" %>

However, it’s not so simple.  ASP.Net does not call Type.GetType to parse these strings; instead, it loops over every referenced assembly and calls Assembly.GetType on each one.  This is why you don’t need to include the assembly name whenever using the Inherits attribute (which would have been necessary for Type.GetType)  Ordinarily, this is very useful, but here, it comes back to bite you. 
It is not possible to parse a type from one assembly with a generic parameter from a different assembly using Assembly.GetType, unless the generic parameter is in mscorlib.

Therefore, for example, it is not possible to create an ASPX page that inherits DataPage<DataLayer.Product> if DataLayer.Product is in a different assembly than DataPage.  As a workaround, one can create a non-generic class which inherits DataPage<DataLayer.Product>, then make the ASPX page inherit this temporary class.

Next time: MVC magic

Building Connection Strings in .Net

.Net developers frequently need to build connection strings, especially when connecting to Access or Excel files using OleDB.
Code like the following has been written countless times:

//Bad code! Do not use!
string conn = "Data Source=" + openFileDialog1.FileName + "; "
            + "Provider=Microsoft.Jet.OLEDB.4.0;"
            + "Extended Properties=\"Excel 8.0\"";

This code looks innocuous at first glance, but will not work for all filenames.  If the filename contains characters like ', ", ;, or =, this code will create an invalid connection string and throw an exception.

The correct way to build connection strings is to use one of the DbConnectionStringBuilder classes.  This class implements a dictionary of key-value pairs in the connection string.  It has a ConnectionString property which assembles the instance’s contents into a usable connection string.  Unlike the string concatenation shown above, this property will correctly escape all values.

In addition, each of the database clients included with the .Net framework (SQL, OleDB, ODBC, Oracle, and Entity Framework) have their own inherited ConnectionStringBuilder classes in their respective namespaces.  These classes add type-safe properties for the the keys supported by their databases, and handle any special cases when generating the connection string.

Thus, the correct way to write the above code is:

var connBuilder = new OleDbConnectionStringBuilder {
    DataSource = openFileDialog1.FileName,
    Provider = "Microsoft.Jet.OLEDB.4.0"
};
connBuilder["Extended Properties"] = "Excel 12.0 Macro";
As an added bonus, these classes implement ICustomTypeDescriptor, so they can be bound to a PropertyGrid to allow the end-user to edit the connection string.  This can be seen in certain places in Visual Studio.

Don’t call Html.Encode in Razor Pages

One of the unique features of ASP.Net WebPages (formerly Razor) is automatic HTML encoding.  All strings printed by embedded code nuggets (@ blocks) are automatically HTML-encoded.

In addition to this feature, Razor also includes the Html.Encode method, probably copied from ASP.Net MVC.  Calling this method naively leads to a nasty surprise – the string will be double-encoded!
To see why, look more closely at a typical call: @Html.Encode("<text>").  This Razor markup will call Html.Encode, which returns the string "&lt;text&gt;".   Since it returns a string and not an IHtmlString, the Razor engine will encode it again, and render &amp;lt;text&amp;gt;.

Careful thought indicates that this behavior is probably correct.  The programmer (hopefully) knows that Razor will escape its output, so the call to Html.Encode should be an attempt to display encoded text.  In fact, this is the simplest way to display HTML-encoded text in a Razor view. 

However, even if it is correct, the behavior is unexpected and should not be relied upon.  The unambiguous way to display encoded text is to call Html.Raw:

@Html.Raw(Html.Encode(Html.Encode("Double-encoded <html> text!")))

Although it is long and clunky, this clearly shows that the text will be double-encoded.

Exercise for the reader: Why is it also necessary to call Html.Raw?

Optional Parameters in C# < 4

C# 4.0 adds support for optional parameters.

The following code prints 4:

static void Main() {
    TestMethod();
}
static void TestMethod(int i = 4) {
    Console.WriteLine(i);
}

Optional parameters are a compiler feature.  The compiler will emit a normal method with the IL [opt] attribute and a .param declaration that includes a default value:

.method hidebysig static void 
        TestMethod([opt] int32 i) cil managed
{
    .param [1] = int32(4)
    .maxstack 8
    L_0001: ldarg.0 
    L_0002: call void [mscorlib]System.Console::WriteLine(int32)
    L_0007: ret 
}

Earlier versions of the C# compiler will ignore this metadata.  Therefore, you can call such methods in earlier versions of C#, but you will always need to pass the optional parameters.

You can also create methods with optional parameters in earlier versions of C#.  You can force the compiler to emit this metadata using the [Optional] and [DefaultParameterValue] attributes.
(in the System.Runtime.InteropServices namespace)

The following C# 1 code will compile identically to the above:

static void TestMethod(
    [Optional, DefaultParameterValue(4)] int i
) {
    Console.WriteLine(i);
}

Since the older compilers cannot consume optional parameters, you will always need to pass i when calling this method from C# < 4.  However, in C# 4, you can call this method without passing the parameter.

Unfortunately, the syntax parser used by VS2010 doesn’t recognize these attributes.  Therefore, if you attempt to call a method using these attributes, inside the project that defines the method, without specifying the parameter, the IDE will show a syntax error.  The compiler itself, however, will work correctly.  I reported this bug on Connect.

The [DefaultParameterValue] attribute is optional; omitting it will use the type’s default value (0, an empty struct, or null, as appropriate) as the default.

These attributes can also be used to create methods with optional parameters using CodeDOM, which cannot emit the new syntax.

Writing output in Razor helpers using code

The new ASP.Net WebPages view engine (formerly Razor) allows you to create reusable parameterized blocks of HTML called helpers

For example:

@helper Fibonacci(int count) {
    int current = 1, prev = 0;
    for (int i = 0; i < count; i++) {
        @:@current, 
        int t = current;
        current += prev;
        prev = t;
    }
}

This helper will write out the first count Fibonacci numbers.  It can be called by writing @Fibonacci(30) in the page that defines the helper.

Using Razor syntax in this code looks strange.  Razor syntax is designed to write HTML tags.  Since I’m printing plain text, I need to use the  @: escape (or the <text> tag) in order to output my text.  This small bit of code looks confusing and can get lost inside larger blocks of server-side code.

Instead, I can use an undocumented hack.  Razor helpers are implemented by compiling a lambda expression as an Action<TextWriter>.  The lambda expression receives a TextWriter parameter named __razor_helper_writer.  (You can see this by writing a Razor page with a compilation error and clicking Show Complete Compilation Source)  There is nothing preventing me from using this parameter yourself.  (it even shows up in IntelliSense!)

Therefore, I can rewrite the helper as follows:

@helper Fibonacci(int count) {
    int current = 1, prev = 0;
    for (int i = 0; i < count; i++) {
        __razor_helper_writer.Write(current + ", ");
        int t = current;
        current += prev;
        prev = t;
    }
}

Remember to correctly escape your text by calling Html.Encode.  Since this writes directly to the output stream, it doesn’t get the benefit of Razor’s automatic escaping.

Note: This relies on undocumented implementation details of the Razor compiler, and may change in future releases. I would not recommend doing this.
Instead, you can write a function:

@functions{ 
    string Fibonacci(int count) {
        var builder = new System.Text.StringBuilder();
        int current = 1, prev = 0;
        for (int i = 0; i < count; i++) {
            builder.Append(current).Append(", ");

            int t = current;
            current += prev;
            prev = t;
        }
        return builder.ToString();
    }
}
This can be called the same way as the helper.  Instead of using the special helper support, the call will just print a string, the same you print any other string.  Therefore, the function’s output will be HTML-escaped.  To prevent that, you can change the function to return an HtmlString.

Binding to lists of DataRows

.Net DataTables can be very useful when writing data-driven applications.  However, they have one limitation: There is no obvious way to databind a grid (or other control) to an arbitrary list of datarows from a table.
You can bind to an entire table directly by setting a DataSource to the DataTable itself, and you can bind to a subset of a table by creating a DataView with a filter. 

In general, you cannot bind to an IEnumerable<T> (eg, a LINQ query); the databinding infrastructure can only handle an IList (non-generic) or an IListSource.  This is true for any kind of datasource.  Therefore, to bind to any LINQ query, you need to call .ToList().  (Or .ToArray())

However, when binding to a DataTable, you  can’t even use a List<DataRow>.  If you try, you’ll get four columns (RowError, RowState, Table, and HasErrors) and no useful information.  This happens because the List<DataRow> doesn’t tell the databinding infrastructure about the special properties of the DataRows.  To understand the problem, some background is necessary

Databinding is controlled by the ListBindingHelper and TypeDescriptor classes.  When you bind to a list, the ListBindingHelper.GetListItemProperties method is called to get the columns in the list.  If the list implements the ITypedList interface, its GetItemProperties  method is called.  Otherwise, it will use TypeDescriptor to get the properties of the first item in the list.  (this uses reflection)

The DataView class (which DataTable also binds through, using IListSource) implements ITypedList and returns DataColumnPropertyDescriptors that expose the columns in the table.  This is why you can bind to a DataView or DataTable and see columns.  However, when you bind to a List<DataRow>, there is no ITypedList that can return the columns as properties.  It therefore falls back on reflection and shows the physical properties of the DataRow class.

To solve this issue, you need to wrap the list in a DataView so that you can take advantage of its ITypedList implementation.  You can do that using the AsDataView() method.  This method is only available on the DataTable  and EnumerableRowCollection<T> classes; it cannot be called on an arbitrary LINQ query.  You can only get an EnumerableRowCollection<T> by calling special versions of the Cast, OrderBy, Where, and Select methods from a DataTable. 

Therefore, you can databind to a simple LINQ query by calling AsDataView() on the query.  To bind to a List<DataRow>, or to a more complicated query, you can use an ugly hack:

List<DataRow> list = ...;
grid.DataSource = table.AsEnumerable()
                       .Where(list.Contains)
                       .AsDataView();

The  AsEnumerable() call is not needed for typed datasets.

You can also call CopyToDataTable(), which will works on an arbitrary IEnumerable<DataRow>.   However, it makes deep copies of the rows, so it isn’t helpful if you want the user to update the data, or if you want the user to see changes made (in code) to the original datarows.