The Dark Side of Covariance

What’s wrong with the following code?

var names = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
...
if (names.Contains(sqlCommand.ExecuteScalar())

This  code is intended to check whether the result of a SQL query is contained in a case-insensitive collection of names.  However, if you run this code, the resulting check will be case-sensitive.  Why?

As you may have guessed from the title, this is caused by covariance.  In fact, this code will not compile at all against .Net 3.5. 

The problem is that ExecuteScalar() returns object, not string.  Therefore, it doesn’t call HashSet<string>.Contains(string), which is what it’s intending to call (and which uses the HashSet’s comparer).  Instead, on .Net 4.0, this calls the  Enumerable.Contains<object>(IEnumerable<object>, string) extension method, using the covariant conversion from IEnumerable<string> to IEnumerable<object>.  Covariance allows us to pass object to the Contains method of any strongly-typed collection (of reference types).

Still, why is it case-sensitive?  As Jon Skeet points out, the LINQ Contains() method is supposed to call any built-in Contains() method from ICollection<T>, so it should still use the HashSet’s case-insensitive Contains().

The reason is that although HashSet<String> implements ICollection<string>, it does not implement ICollection<object>.  Since we’re calling Enumerable.Contains<object>, it checks whether the sequence implements ICollection<object>, which it doesn’t.  (ICollection<T> is not covariant, since it allows write access)

Fortunately, there’s a simple fix: just cast the return value back to string (and add a comment explaining the problem).  This allows the compiler to call HashSet<string>.Contains(string), as was originally intended.

//Call HashSet<string>.Contains(string), not the
//covariant Enumerable.Contains(IEnumerable<object>, object)
//http://blog.slaks.net/2011/12/dark-side-of-covariance.html
if (names.Contains((string)sqlCommand.ExecuteScalar())
(I discovered this issue in my StringListConstraint for ASP.Net MVC)

CAPTCHAs do not mitigate XSS worms

One common misconception about web security is that protecting important actions with CAPTCHAs can prevent XSS attacks from doing real damage.  By preventing malicious code from scripting critical tasks, the idea goes, XSS injections won’t be able to accomplish much.

This idea is dangerously wrong. 

First of all, this should not even be considered except as a defense-in-depth mechanism.  Regardless of whether the actions you care about are protected by CAPTCHAs, XSS attacks can create arbitrary UI on your pages, and can thus make “perfect” phishing attacks.

Also, even with CAPTCHAs, an XSS injection can wait until the user performs the critical action, then change the submitted data to the attacker’s whim.

For example, if Twitter took this approach to prevent XSS injections from sending spammy tweets, the attacker could simply wait until the user sends a real tweet, then silently append advertising to the tweet as the user submits it and fills out the CAPTCHA.

However, there is also a more fundamental issue.  Since the injected Javascript is running in the user’s browser, it simply display the CAPTCHA to the user and block all page functionality until the user solves the CAPTCHA.  The attacker can even put his own text around the CAPTCHA to look like a legitimate security precaution, so that the (typical) user will not realize that the site has been compromised.  (that could be prevented by integrating a description of the action being performed into the CAPTCHA itself in a way that the attacker can’t hide)

I haven’t even mentioned the inconvenience of forcing all legitimate, uncompromised users to fill out CAPTCHAs every time they do anything significant.

In summary, CAPTCHAs should only be used to prevent programs from automatically performing actions (eg, bulk-registering Google accounts), and as a rate-limiter if a user sends too many requests too quickly (eg, getting a password wrong too many times in a row).

XSS can only be stopped by properly encoding all user-generated content that gets concatenated into markup (whether HTML, Javascript, or CSS)

About Concurrent Collections

One of the most useful additions to the .Net 4.0 base class library is the System.Collections.Concurrent namespace, which contains an all-new set of lock-free thread.

However, these collections are noticeably different from their classical counterparts.  There is no simple ConcurrentList<T> that you can drop into your code so that it will become thread-safe.  Instead, the new namespace has a queue, a stack, and some new thing called a bag, as well as ConcurrentDictionary<TKey, TValue> that largely resembles classical dictionaries.  It also has a BlockingCollection<T> class that wraps a concurrent collection and blocks until operations can succeed.

Many people have complained that Microsoft chose to provide an entirely new set of classes, rather than adding synchronized versions of the existing collections. 

In truth, however, creating this new set of classes is the correct – and, in fact, only – choice.  Ordinary synchronized are rarely useful and will not make a program thread-safe.  In general, slapping locks everywhere does not make a program thread-safe!  Rather, that will either not help (if there aren’t enough locks) or, if there are enough locks, result in deadlocks or a program that never runs more than one thread at a time.  (depending on how many different objects get locked)

Collections in particular have two fundamental issues when used on multiple threads.

The first, and simpler, issue is that the collection classes themselves are not thread-safe.  If two threads add to a List<T> at the same exact time, one thread is likely to overwrite the other thread’s value.  A synchronized version of List<T> with a lock around every method would solve this problem.

The bigger issue is that any code that uses a List<T> is unlikely to be thread-safe, even if the list itself is thread-safe.  For example. you can never enumerate over a multi-threaded list, because another thread may change the list at any time, invalidating the enumerator.  This issue could be solved by taking a read lock for the lifetime of the enumerator.  However, that is also a bad idea, since if any client code forgets to dispose the enumerator, the collection will deadlock when written to.

You also cannot use indices.  It is never safe to get, set, or remove an item at an index, because another thread might remove that item or clear the entire collection between your index check and the operation.

To solve all of these problems, you need thread-safe collections that provide atomic operations.  This is why all of the concurrent collections have such strange methods, including TryPop, AddOrUpdate, and TryTake.  These methods perform their operations atomically, and return false if the collection was empty (as appropriate; consult the documentation for actual detail).  Thus, the new concurrent collections can be used reliably in actual multi-threaded code without a separate layer of locks.

Beware of Response.RedirectToRoute in MVC 3.0

ASP.Net MVC uses the new (to ASP.Net 3.5) Http*Base wrapper classes (HttpContextBase, HttpRequestBase, HttpResponseBase, etc) instead of the original Http* classes.  This allows you to create mock implementations that inherit the Http*Base classes without an actual HTTP request.  This is useful for unit testing, and for overriding standard behaviors (such as route checking).

In ordinary MVC code, the HttpContext, Request, and Response properties will return Http*Wrapper instances that directly wrap the original Http* classes (eg, HttpContextWrapper, which wraps HttpContext).  Most MVC developers use the HttpContext and related properties without being aware of any of this redirection.

Until you call Response.RedirectToRoute.  This method, which is new to .Net 4.0, redirects the browser to a URL for a route in the new ASP.Net routing engine.  Like other HttpResponse methods, HttpResponseBase has its own version of this method for derived classes to override.

However, in .Net 4.0, Microsoft forgot to override this method in the standard HttpResponseWrapper.  Therefore, if you call Response.RedirectToRoute in an MVC application (where Response is actually an HttpResponseWrapper), you’ll get a NotImplementedException.

You can see this oversight in the methods list for HttpResponseWrapper.  Every method except for RedirectToRoute and RedirectToRoutePermanent are list as (Overrides HttpResponseBase.MethodName().); these methods are listed as (Inherited from HttpResponseBase.MethodName().)

To work around this issue, you can either use the original HttpResponse by writing HttpContext.Current.Response.RedirectToRoute(…) or by calling Response.Redirect instead.

Note that most MVC applications should not call Response.Redirect or Response.RedirectToRoute at all; instead, they should return ActionResults by calling helper methods like return Redirect(…); or return RedirectToAction(…);

In the upcoming ASP.Net 4.5 release, these methods have been properly overridden.

Caller Info Attributes vs. Stack Walking

People sometimes wonder why C# 5 needs to add caller info attributes, when this information is already available by using the StackTrace class.  In reality, caller info attributes behave rather differently from the StackTrace class, for a number of reasons.

Advantages to Caller Info Attributes

The primary reason to use caller info attributes is that they’re much faster.  Stack walking is one of the slowest internal (as opposed to network IO) things you can do in .Net (disclaimer: I haven’t measured).  By contrast, caller info attributes have exactly 0 performance penalty.  Caller info is resolved at compile-time; the callsite is compiled to pass a string or int literal that was determined by the compiler.  Incidentally, this is why C# 5 doesn’t have [CallerType] or [CallerMemberInfo] attributes; the compiler team wasn’t happy with the performance of the result IL and didn’t have time to implement proper caching to make it faster.

Caller info attributes are also more reliable than stack walking.  Stack walking will give incorrect results if the JITter decides to inline either your method or its caller.  Since caller info attributes are resolved at compile-time, they are immune to inlining.  Also, stack walking can only give line number and filename info if the calling assembly’s PDB file is present at runtime, whereas [CallerFileName] and [CallerLineNumber] will always work. 

Caller info attributes are also unaffected by compiler transformations.  If you do a stack walk from inside a lambda expression, iterator, or async method, you’ll see the compiler-generated method; there is no good way to retrieve the name of the original method.  Since caller info attributes are processed by the compiler, [CallerMemberName] should correctly pass the name of the containing method (although I cannot verify this).  Note that stack walking will retrieve the correct line number even inside lambda expressions (assuming there is a PDB file)

Advantages To Stack Walking

There are still a couple of reasons to use the StackTrace class.  Obviously, if you want to see more than one frame (if you want your caller’s caller), you need to use StackTrace.  Also, if you want to support clients in languages that don’t feature caller info attributes, such as C# 4 or F#, you should walk the stack instead.

Stack walking is the only way to find the caller’s type or assembly (or a MemberInfo object), since caller info attributes do not provide this information.

Stack walking is also the only choice for security-related scenarios.  It should be obvious that caller info attributes can trivially be spoofed; nothing prevents the caller from passing arbitrary values as the optional parameter.  Stack walking, by contrast, happens within the CLR and cannot be spoofed. 

Note that there are some other concerns with stack walking for security purposes.

  • Filename and line number information can be spoofed by providing a modified PDB file
  • Class and function names are meaningless; your enemy can name his function whatever he wants to.  All you should rely on is the assembly name.
  • There are various ways for an attacker to cause your code to be called indirectly (eg, DynamicInvoking a delegate) so that his assembly isn’t the direct caller.  In fact, if the attacker invokes your delegate on a UI thread (by calling BeginInvoke), his assembly won’t show up on the callstack at all.  The attacker can also compile a dynamic assembly that calls your function and call into that assembly.
  • As always, beware of inlining

Subtleties of C# 5’s new [CallerMemberName]

UPDATE: Now that the Visual Studio 11 beta has shipped with this feature implemented, I wrote a separate blog post exploring how it actually behaves in these corner cases.

Last time, I explored various pathological code samples in which the [CallerLineNumber] attribute does not have obvious behavior.  This time, I’ll cover the last of these new caller info attributes: [CallerMemberName].

The [CallerMemberName] attribute tells the compiler to insert the name of the containing member instead of a parameter’s default value.  Unlike [CallerLineNumber] and [CallerFileName], this has no equivalent in C++; since the C / C++ versions of these features are in the preprocessor, they cannot be aware of member names.

Most calls to methods with optional parameters take place within a named method, so the behavior of this attribute is usually obvious.  However, there are a couple of places where the exact method name is not so obvious.

If you call a [CallerMemberName] method inside a property or event accessor, what name should the compiler pass?  Common sense indicates that it should pass the name of the property or event, since that’s the name you actually see in source code.  That would also allow this attribute to be used for raising PropertyChanged events.  However, this option doesn’t pass enough information, since it would not be possible to determine whether it was called from the getter or the setter.  To expose the maximal amount of information, the compiler should pass the name of the actual method for the accessor – get_Name or set_Name

I would assume that the compiler only passes the property name, since that is what most people would probably expect.

A less-trivial question arises when such a method is called from a constructor or static constructor.  Should the compiler just pass the name of the class, since that’s what the member is named in source code? If so, there would be no way to distinguish between an instance constructor and a static constructor.  Should the compiler pass the actual names of the CLR methods (.ctor and .cctor)?  If so, there would be no way to tell the class name, which is worse.  Should it pass both (ClassName.ctor)? That would expose the maximal amount of information, but wouldn’t match the behavior in other members, which does not include the class name.

On a related note, what about calls to base or this constructors that take [CallerMemberName] arguments? Is that considered part of the class’ constructor, even though the call is lexically scoped outside the constructor?  If not, what should it pass?

A further related concern is field initializers. Since field initializers aren’t explicitly in any member, what should the compiler pass if you call a [CallerMemberName] method in a field initializer? I would assume that they’re treated like contructors (or static constructors for static field initializers)

I would assume that a call to a [CallerMemberName] method from within an anonymous method or LINQ query would use the name of the parent method.

The most interesting question concerns attributes.  What should happen if you declare your own custom attribute that takes a [CallerMemberName] parameter, then apply the attribute somewhere without specifying the parameter?

If you place the attribute on a parameter, return value, or method, it would make sense for the compiler to pass the name of the method that the attribute was applied to.  If you apply the attribute to a type, it might make sense to pass the name of that type.  However, there is no obvious choice for attributes applied to a module or assembly.

I suspect that they instead chose to not pass these caller info in default parameters for attribute declarations, and to instead pass the parameters’ declared default values.  If so, it would make sense to disallow caller info attributes in attribute constructor parameters.  However, this would also prevent them from being used for attributes that are explicitly instantiated in normal code (eg, for global filters in MVC).

Next Time: Caller Info Attributes vs. Stack Walking

Subtleties of C# 5’s new [CallerLineNumber]

UPDATE: Now that the Visual Studio 11 beta has shipped with this feature implemented, I wrote a separate blog post exploring how it actually behaves in these corner cases.

This is part 2 in a series about C# 5’s new caller info attributes; see the introduction.

The [CallerLineNumber] attribute tells the compiler to use the line number of the call site instead of the parameter’s default value.  This attribute has more corner cases than [CallerFileName].  In particular, unlike the C preprocessor’s __LINE__ macro, the C# compiler inserts the line number of a parsed method call.  Therefore, it is not always clear which line a method call expression maps to.

What should this call print:

static class Utils {
    int GetLine([CallerLineNumber] int line = 0) {
        return line;
    }
}

Console.WriteLine(
    Utils
        .
        GetLine
        (
        )
    );

Should it print the line number that the statement started? The line in which the call to GetLine started? The line containing the parentheses for GetLine? What if it’s in a multi-line lambda expression?

There are also a few cases in which methods are called implicitly by the compiler without appearing in source code.  What should this code print?

class Funny {
    public Funny Select(Func<object, object> d,
                        [CallerLineNumber]int line = 0) {
        Console.WriteLine(line + ": " + d(d));
        return this;
    }
}

var r = (from x in new Funny() 
         let y = 1 
         select 2);

This code contains two implicit calls to the Select method that don’t have a clear source line (it gets worse for more complicated LINQ queries)

In fact, it is possible to have an implicit method call with no corresponding source code at all.

Consider this code:

class Loggable {
    public Loggable([CallerLineNumber] int line = 0) { }
}
class SomeClass : Loggable { }

The compiler will implicitly generate a constructor for SomeClass that calls the base Loggable constructor with its default parameter value.  What line should it pass? In fact, if SomeClass is a partial class that is defined in multiple files, it isn’t even clear what [CallerFileName] should pass.

Also, what should happen in the unlikely case that a [CallerLineNumber] method is called on line 3 billion (which would overflow an int)? (This would be easier to test on Roslyn with a fake stream source) Should it give an integer overflow compile-time error?  If [CallerLineNumber] also supports byte and short parameters, this scenario will be more likely to happen in practice.

Next time: [CallerMemberName]

Subtleties of the new Caller Info Attributes in C# 5

UPDATE: Now that the Visual Studio 11 beta has shipped with this feature implemented, I wrote a separate blog post exploring how it actually behaves in these corner cases.

C# 5 is all about asynchronous programming.  However, in additional to the new async features, the C# team managed to slip in a much simpler feature: Caller Info Attributes.

Since C#’s inception, developers have asked for __LINE__ and __FILE__ macros like those in C and C++.  Since C# intentionally does not support macros, these requests have not been answered.  Until now.

C# 5 adds these features using attributes and optional parameters.  As Anders Hejlsberg presented at //Build/, you can write

public static void Log(string message,
    [CallerFilePath] string file = "",
    [CallerLineNumber] int line = 0,
    [CallerMemberName] string member = "")
{
    Console.WriteLine("{0}:{1} – {2}: {3}", 
                      file, line, member, message);

}

If this method is called in C# 5 without specifying the optional parameters, the compiler will insert the file name, line number, and containing member name instead of the default values.  If it’s called in older languages that don’t support optional parameters, those languages will pass the default values, like any other optional method.

These features look trivial at first glance.  However, like most features, they are actually more complicated to design in a solid and robust fashion.  Here are some of the less obvious issues that the C# team needed to deal with when creating this feature:

Disclaimer: I am basing these posts entirely on  logical deduction.  I do not have access to a specification or implementation of this feature; all I know is what Anders announced in his //Build/ presentation.  However, the C# team would have needed to somehow deal with each of thee issues.  Since these attributes are not yet supported by any public CTP, I can’t test my assumptions

To start with, they needed to create new compiler errors if the attribute is applied to an incorrectly-typed parameter, or a non-optional parameter.  Creating compiler errors is expensive; they need to be documented, tested, and localized into every language supported by C#.

The attributes should perhaps support nullable types, or parameter types with custom implicit conversions to int or string.  (especially long or short)

If a method with these attributes is called in an expression tree literal (a lambda expression converted to an Expression<TDelegate>), the compiler would need to insert ConstantExpressions with the actual line number or other info to pass as the parameter.

In addition to being supported in methods, the attributes should also be supported on parameters  for delegate types, allowing you to write

delegate void LinePrinter([CallerLineNumber] int line = 0);

LinePrinter printer = Console.WriteLine;
printer();

Each of the individual attributes also has subtle issues.

[CallerFilePath] seems fairly simple.  Any function call must happen in source code, in a source file that has a path.  However, it needs to take into account #line directives, so that, for example, it will work as expected in Razor views.  I don’t know what it does inside #line hidden, in which there isn’t a source file.

Next time: [CallerLineNumber]

Using a default controller in ASP.Net MVC

One common question about ASP.Net MVC is how to make “default” controller.

Most websites will have a Home controller with actions like About, FAQ, Privacy, or similar pages.  Ordinarily, these actions can only be accessed through URLs like ~/Home/About.  Most people would prefer to put these URLs directly off the root: ~/About, etc.

Unfortunately, there is no obvious way to do that in ASP.Net MVC without making a separate route or controller for each action.

You cannot simply create a route matching "/{action}" and map it to the Home controller, since such a route would match any URL with exactly one term, including URLs meant for other controllers.  Since the routing engine is not aware of MVC actions, it doesn’t know that this route should only match actions that actually exist on the controller.

To make it work, we can add a custom route constraint that forces this route to only match URLs that correspond to actual methods on the controller.

To this end, I wrote an extension method that scans a controller for all action methods and adds a route that matches actions in that controller. The code is available at gist.github.com/1225676.  It can be used like this:

routes.MapDefaultController<Controllers.HomeController>();

This maps the route "/{action}/{id}" (with id optional) to all actions defined in HomeController.   Note that this code ignores custom ActionNameSelectorAttributes (The built-in [ActionName(…)] is supported).

For additional flexibility, you can also create custom routes that will only match actions in a specific controller.  This is useful if you have a single controller with a number of actions that has special route requirements that differ from the rest of your site.

For example:

routes.MapControllerActions<UsersController>(
    name: "User routes",
    url:  "{userName}/{action}"
    defaults: new { action = "Index" }
);

(Note that this example will also match URLs intended for other controllers with the same actions; plan your routes carefully)

XRegExp breaks jQuery Animations

XRegExp is an open source JavaScript library that provides an augmented, extensible, cross-browser implementation of regular expressions, including support for additional syntax, flags, and methods.

It’s used by the popular SyntaxHighlighter script, which is in turn used by many websites (including this blog) to display syntax-highlighted source code on the client.  Thus, XRegExp has a rather wide usage base.

However, XRegExp conflicts with jQuery.  In IE, any page that includes XRegExp and runs a numeric jQuery animation will result in “TypeError: Object doesn't support this property or method”.

Demo (only fails in IE)

This bug is caused by an XRegExp fix for an IE bug in which the exec method doesn't consistently return undefined for nonparticipating capturing groups.  The fix, on line 271, assumes that the parameter passed to exec is a string.  This behavior violates the ECMAScript standard (section 15.10.6.2), which states that the parameter to exec should be converted to a string before proceeding.

jQuery relies on this behavior in its animate method, which parses a number using a regex to get the decimal portion.  (source)

Thus, calling animate with a number after loading XRegExp will fail in IE when XRegExp tries to call slice on a number.

Fortunately, it is very simple to fix XRegExp to convert its argument to a string first:

if (XRegExp) {
    var xExec = RegExp.prototype.exec;
    RegExp.prototype.exec = function(str) {
        if (!str.slice) 
            str = String(str);
        return xExec.call(this, str);
    };
}
Here is an updated demo that uses this fix and works even in IE.

Clarifying Boolean Parameters, part 2

Part 1 is here

Some languages have better ways to pass boolean parameters.  C# 4.0, and all versions of VB, allow parameters to be passed by name.  This allows us to write much clearer code:

//C# 4.0:
UpdateLayout(doFullLayout: false) 
'VB.Net:
UpdateLayout(doFullLayout:=False) 

Without requiring any changes to the function definition, this makes the meaning of the true / false abundantly clear at the call-site.

Javascript offers another interesting alternative.  In Javascript, booleans conditions actually check for “truthyness”.  The statement if(x) will trigger  not just if x is true, but also if x is any “truthy” value, including any object, non-empty string, or non-zero number. Similarly, the expression !x will return false if x is “truthy” and true if x “falsy”.

This means that we can actually use any non-empty string instead of true in Javascript.  Note that this will only work if the function checks the value for “truthyness”; it won’t work for code like if (x === true).

Thus, instead of passing true as a boolean, you can pass a string that describes what you’re actually indicating.

For example:

function updatePosition(animate) {
    //Calculate position
    if (animate)
        //...
    else
        //...
}

$(window).resize(function() {
    updatePosition();
});

updatePosition("With animation");

Although this results in much more readable code, it can be difficult to understand for people who aren’t familiar with this trick.  If the meaning of the parameter changes, you’ll need to hunt down every place that the function is called and change the string to reflect the new meaning.

Finally, unlike an enum, this does not scale to multiple options.  If you need to have more than two options, you should use global variables or objects to simulate an enum, not strings.

Clarifying Boolean Parameters, part 1

Have you ever written code like this:

public void UpdateLayout(bool doFullLayout) {
    //Code
    if (doFullLayout) {
        //Expensive code
    }
    //More code
}

This pattern is commonly used when some operation has a “cheap” mode and an “expensive” mode.  Other code will have calls like UpdateLayout(false) and UpdateLayout(true) scattered throughout.

The problem is that this isn’t very obvious for people who aren’t familiar with the codebase.  If you take a look at a file you’ve never seen before and see calls like UpdateLayout(false) and UpdateLayout(true) scattered, you’ll have no idea what the true / false means.

The simplest solution is to break it out into two methods: UpdateComplexLayout() and UpdateBasicLayout().  However,  if the two different layout modes have intertwined code paths (eg, the code before and after the if above), this either won’t be possible or will lead to ugly duplication of code.

One alternative is to use enums:

public enum LayoutUpdateType {
    Basic,
    Full
}

public void UpdateLayout(LayoutUpdateType type) {
    //Code
    if (type == LayoutUpdateType.Full) {
        //Expensive code
    }
    //More code
}

This way, the callsites are much more descriptive: UpdateLayout(LayoutUpdateType.Full).  This also makes it easy to add more update modes in the future should the need arise.  However, it makes the callsites much more verbose.  When used frequently, this pattern can lead a vast proliferation of enum types that are each only used by one method, polluting the namespace and making more important enums harder to notice.

Next time: Cleverer alternatives

C# is not type-safe

C# is usually touted as a type-safe language.  However, it is not actually fully type-safe!

To examine this claim, we must first provide a strict definition of type-safety  Wikipedia says:

In computer science, type safety is the extent to which a programming language discourages or prevents type errors. A type error is erroneous or undesirable program behavior caused by a discrepancy between differing data types.

To translate this to C#, full type-safety means that any expression that compiles is guaranteed to work at runtime, without causing any invalid cast errors.

Obviously, the cast (and as) operator is an escape hatch from type safety.  It tells the compiler that “I expect this value to actually be of this type, even though you can’t prove it.  If I’m wrong, I’ll live with that”.  Therefore, to be fully type-safe, it must be impossible to get an InvalidCastException at runtime in C# code that does not contain an explicit cast.

Note that parsing or conversion errors (such as any exception from the Convert class) don’t count.  Parsing errors aren’t actually invalid cast errors (instead, they come from unexpected strings), and conversion errors from from cast operations inside the Convert class.  Also, null reference exceptions aren’t cast errors. 

So, why isn’t C# type-safe?

MSDN says that InvalidCastException is thrown in two conditions:

  • For a conversion from a Single or a Double to a Decimal, the source value is infinity, Not-a-Number (NaN), or too large to be represented as the destination type.

  • A failure occurs during an explicit reference conversion.

Both of these conditions can only occur from a cast operation, so it looks like C# is in fact type safe.

Or is it?

IEnumerable numbers = new int[] { 1, 2, 3 };

foreach(string x in numbers) 
    ;

This code compiles (!). Running it results in

InvalidCastException: Unable to cast object of type 'System.Int32' to type 'System.String'.

On the foreach line.

Since we don’t have any explicit cast operations (The implicit conversion from int[] to IEnumerable is an implicit conversion, which is guaranteed to succeed) , this proves that C# is not type-safe.

What happened?

The foreach construct comes from C# 1.0, before generics existed.  It worked with untyped collections such as ArrayList or IEnumerable.  Therefore, the IEnumerator.Current property that gets assigned to the loop variable would usually be of type object.   (In fact, the foreach statement is duck-typed to allow the enumerator to provide a typed Current property, particularly to avoid boxing). 

Therefore, you would expect that almost all (non-generic) foreach loops would need to have the loop variable declared as object, since that’s the compile-time type of the items in the collection.  Since that would be extremely annoying, the compiler allows you to use any type you want, and will implicitly cast the Current values to the type you declared.  Thus, mis-declaring the type results in an InvalidCastException.

Note that if the foreach type isn’t compatible at all with the type of the Current property, you will get a compile-time error (just like (string)42 doesn’t compile).  Therefore, if you stick with generic collections, you’re won’t get these runtime errors (unless you declare the foreach as a subtype of the item type).

C# also isn’t type-safe because of array covariance.

string[] strings = new string[1];
object[] arr = strings;
arr[0] = 7;

This code compiles, but throws “ArrayTypeMismatchException: Attempted to access an element as a type incompatible with the array.” at run-time.

As Eric Lippert explains, this feature was added in order to be more compatible with Java.

Delegates vs. Function Pointers, Addendum: Multicast Delegates

Until now, I've been focusing on only one of the differences between delegates and function pointers; namely, associated state.
Delegates have one other capability that function pointers do not.  A single function pointer can only point to one function.  .Net, on the other hand, supports multicast delegates – delegates that point to multiple functions.  You can combine two existing delegates using the + operator (or by calling Delegate.Combine) to create a single new delegate instance that points two all of the methods in the original two delegates.  This new delegate stores all of the methods from the original two delegates in a private array of delegates called InvocationList (the delegates in this array are ordinary non-multicast delegates that each only point to a single method). 

Note that delegates, like strings, are immutable.  Adding two delegates together creates a third delegate containing the methods from the first two; the original delegate instances are not affected.  For example, writing delegateField += SomeMethod creates a new delegate instance containing the methods originally in delegateField as well as SomeMethod, then stores this new instance in delegateField.

Similarly, the - operator (or Delegate.Remove) will remove the second operand from the first one (again, returning a new delegate instance).  If the second operand has multiple methods, all of them will be removed from the final delegate.  If some of the methods in the second operand appear multiple times in the original delegate, only the last occurrence of each one will be removed (the one most recently added).  The RemoveAll method will remove all occurrences.  If all of the methods were removed, it will return null; there is no such thing as an empty delegate instance.

Multicast delegates are not intended to be used with delegates that return values.  If you call a non-void delegate that contains multiple methods, it will return the return value of the last method in the delegate.  If you want to see the return values of all of the methods, you’ll need to loop over GetInvocationList() and call each delegate individually.

Multicast delegates also don’t play well with the new covariant and contravariant generic delegates in .Net 4.0.  You cannot combine two delegates unless their types match exactly, including variant generic parameters.

Function pointers cannot easily be combined the way multicast delegates can.  The only way to combine function pointers without cooperation from the code that calls the pointer is to make a function that uses a closure to call all of the function pointers you want to call.

In Javascript, that would look like this:

function combine() {
    var methods = arguments;

    return function() { 
        var retVal;
        for(var i = 0; i < methods.length; i++) 
            retVal = methods[i].apply(this, arguments);
        return retVal;
    };
}

Tracking Event Handler Registrations

When working with large .Net applications, it can be useful to find out where event handlers are being registered, especially in an unfamiliar codebase.

In simple cases, you can do this by right-clicking the event definition and clicking Find All References (Shift+F12).  This will show you every line of code that adds or removes a handler from the event by name.  For field-like (ordinary) events, this will also show you every line of code that raises the event.

However, this isn’t always good enough.  Sometimes, event handlers are not added by name.  The .Net data-binding infrastructure, as well as the CompositeUI Event Broker service, will add and remove event handlers using reflection, so they won’t be found by Find All References.  Similarly, if an event handler is added by an external DLL, Find All References won’t find it.

For these scenarios, you can use a less-obvious trick.  As I described last time, adding or removing an event handler actually executes code inside of an accessor method. Like any other code, we can set a breakpoint to see where the code is executed.

For custom events, this is easy.  Just add a breakpoint in the add and/or remove accessors and run your program.  Whenever a handler is added or removed, the debugger will break into the accessor, and you can look at the callstack to determine where it’s coming from.

However, most events are field-like, and don’t have actual source code in their accessor methods.  To set a breakpoint in a field-like event, you need to use a lesser-known feature: function breakpoints (Unfortunately, this feature is not available in Visual Studio Express).  You can click Debug, New Breakpoint, Break at Function (Ctrl+D, N) to tell the debugger to pause whenever a specific managed function is executed.

To add a breakpoint at an event accessor, type Namespace.ClassName.add_EventName.  To ensure that you entered it correctly, open the Debug, Breakpoints window (Ctrl+D, B) and check that the new breakpoint says break always (currently 0) in the Hit Count column.  If it doesn’t say (currently 0), then either the assembly has not been loaded yet or you made a typo in the location (right-click the breakpoint and click Location).

About .Net Events

A .Net event actually consists of a pair of accessor methods named add_EventName and remove_EventName.  These functions each take a handler delegate, and are expected to add or remove that delegate from the list of event handlers. 

In C#, writing public event EventHandler EventName; creates a field-like event.  The compiler will automatically generate a private backing field (also a delegate), along with thread-safe accessor methods that add and remove handlers from the backing field (like an auto-implemented property).  Within the class that declared the event, EventName refers to this private backing field.  Thus, writing EventName(...) in the class calls this field and raises the event (if no handlers have been added, the field will be null).

You can also write custom event accessors to gain full control over how handlers are added to your events.   For example, this event will store and trigger handlers in reverse order:

void Main()
{
    ReversedEvent += delegate { Console.WriteLine(1); };
    ReversedEvent += delegate { Console.WriteLine(2); };
    ReversedEvent += delegate { Console.WriteLine(3); };

    OnReversedEvent();
}

protected void OnReversedEvent() {
    if (reversedEvent != null)
        reversedEvent(this, EventArgs.Empty);
}

private EventHandler reversedEvent;
public event EventHandler ReversedEvent {
    add {
        reversedEvent = value + reversedEvent;
    }
    remove {
        reversedEvent -= value;
    }
}

This add accessor uses the non-commutative delegate addition operator to prepend each new handler to the delegate field containing the existing handlers.  The raiser method simply calls the combined delegate in the private field. (which is null if there aren’t any handlers)

Note that this code is not thread-safe.  If two threads add a handler at the same time, both of them will read the original storage field, add their respective handlers to create a new delegate instance, then write this new delegate back to the field.  The thread that writes back to the field last will overwrite the changes made by the other thread, since it never saw the other thread’s handler (this is the same reason that x += y is not thread-safe).  The accessors generated by the compiler are threadsafe, either by using lock(this) (C# 3 or earlier) or a lock-free threadsafe implementation (C# 4).  For more details, see this series of blog posts.

This example is rather useless.  However, there are better reasons to create custom event accessors. WinForms controls store their events in a special EventHandlerList class to save memory.  WPF controls create events using the Routed Event system, and store handlers in special storage in DependencyObject.  Custom event accessors can also be used to perform validation or logging.

Creating Local Extension Methods

Sometimes, it can be useful to make an extension method specifically for a single block of code.  Unfortunately, since extension methods cannot appear in nested classes, there is no obvious way to do that.

Instead, you can create a child namespace containing the extension method.  In order to limit the extension method’s visibility to a single method, you can put that method in a separate namespace block.  This way, you can add a using statement to that namespace alone.

For example:

namespace Company.Project {
    partial class MyClass {
        ...
    }
}
namespace Company.Project {
    using MyClassExtensions;
    namespace MyClassExtensions {
        static class Extensions {
            public static string Name<T>(this T obj) {
                if (default(T) == null && Equals(obj, default(T)))
                    return "(null " + typeof(T) + ")";
                return obj.GetType() + ": " + obj.ToString() 
                     + "{declared as " + typeof(T) + "}";
            }
        }
    }
    partial class MyClass {
        void DoSomething() {
            object x = new DateTime();
            string name = x.Name();
        }
    }
}

Since the using MyClassExtensions statement appears inside the second namespace block, the extension methods are only visible within that block.  Code that uses these extension method can appear in this second block, while the rest of the class can go in the original namespace block without the extension methods.

This technique should be avoided where possible, since it leads to confusing and non-obvious code.  However, there are situations in which this can make some code much more readable.

Don’t modify other controls during a WPF layout pass

Unlike WinForms or native Win32 development, WPF provides a rich layout model which allows developers to easily create complicated UIs that resize to fit their contents or the parent window.

However, when developing custom controls, it can be necessary to layout child controls manually by overriding the MeasureOverride and ArrangeOverride methods.  To quote MSDN,

Measure allows a component to determine how much size it would like to take. This is a separate phase from Arrange because there are many situations where a parent element will ask a child to measure several times to determine its optimal position and size. The fact that parent elements ask child elements to measure demonstrates another key philosophy of WPF – size to content. All controls in WPF support the ability to size to the natural size of their content. This makes localization much easier, and allows for dynamic layout of elements as things resize. The Arrange phase allows a parent to position and determine the final size of each child.

Overriding these methods gives your custom control full power over the layout of its child element(s).

Be careful what you do when overriding these methods.  Any code in MeasureOverride or ArrangeOverride runs during the WPF layout passes.  in these methods, you should not modify any part of the visual tree outside of the control you’re overriding in.  If you do, you’ll be changing the visuals between Measure() and Arrange(), which will have unexpected results.

It is safe to modify your own child controls during the layout pass.  Before you call Measure() on a child control, its layout pass has not started.  Therefore, any changes will be seen by the child’s layout code.  Similarly, after you Arrange() a child control, its layout pass is finished, so it is safe to modify again (although you may end up triggering another layout pass to see the changes).

If you do need to modify an outside control during the layout pass, you should call Dispatcher.BeginInvoke() to run code asynchronously during the next message loop.  This way, your code will run after the layout pass finishes, and it will be able to safely modify whatever it wants.

Note that Measure() can be called multiple times during a single layout pass (if a parent needs to iteratively determine the best fit for a child).

Delegates vs. Function Pointers, part 5: Javascript

This is part 5 in a series about state and function pointers; part 1 is here.

Last time, we saw how C# 2 supports closures by compiling anonymous functions into member functions of a special class that holds local state from the outer function. 

Unlike the languages we’ve looked at before, Javascript has had closures baked in to the languages since its inception.  My standard example can be achieved very simply in Javascript:

var x = 2;
var numbers = [ 1, 2, 3, 4 ];
var hugeNumbers = numbers.filter(function(n) { return n > x; });

This code uses the Array.filter method, new to Javascript 1.6, to create a new array with those elements from the first array that pass a callback.  The function expression passed to filter captures the x variable for use inside the callback.

This looks extremely similar to the C# 2.0 version from last time.  However. under the covers, it’s rather different.

Like .Net managed instance methods, all Javascript functions take a hidden this parameter.  However, unlike .Net, Javascript does not have delegates.  There is no (intrinsic) way to bind an object to the this parameter the way a .Net closed delegate does.  Instead, the this parameter comes from the callsite, depending on how the function was called.  Therefore, we cannot pass state in the this parameter the way we did in C#.

Instead, all Javascript function expressions capture the variable environment of the scope that they are declared in as a hidden property of the function.  Therefore, a function can reference local variables from its declaring scope.  Unlike C#, which binds functions to their parent scopes using a field in a separate delegate object that points to the function, Javascript functions have their parent scopes baked in to the functions themselves. 

Javascript doesn’t have separate delegate objects that can hold a function and a this parameter.  Instead, the value of the this parameter is determined at the call-site, depending on how the function was called.  This is a common source of confusion to inexperienced Javascript developers.

To simulate closed delegates, we can make a method that takes a function as well as a target object to call it on, and returns a new function which calls the original function with this equal to the target parameter.  That sounds overwhelmingly complicated, but it’s actually not that hard:

function createDelegate(func, target) {
    return function() { 
        return func.apply(target, arguments);
    };
}

var myObject = { name: "Target!"};
function myMethod() {
    return this.name;
}

var delegate = createDelegate(myMethod, myObject);
alert(delegate());

This createDelegate method returns a function expression that captures the func and target parameters, and calls func in the context of target.  Instead of storing the target in a property of a Delegate object (like .Net does), this code stores it in the inner function expression’s closure.

Javascript 1.8.5 provides the Function.bind method, which is equivalent to this createDelegate method, with additional capabilities as well.  In Chrome, Firefox 4, and IE9, you can write

var myObject = { name: "Target!"};
function myMethod() {
    return this.name;
}

var delegate = myMethod.bind(myObject);
alert(delegate());
For more information, see the MDN documentation.

Delegates vs. Function Pointers, part 4: C# 2.0+

This is part 4 in a series about state and function pointers; part 1 is here.

Last time, we saw that it is possible to pass local state with a delegate in C#.  However, it involves lots of repetitive single-use classes, leading to ugly code.

To alleviate this tedious task, C# 2 supports anonymous methods, which allow you to embed a function inside another function.  This makes my standard example much simpler:

//C# 2.0
int x = 2;
int[] numbers = { 1, 2, 3, 4 };

int[] hugeNumbers = Array.FindAll(
    numbers, 
    delegate(int n) { return n > x; }
);



//C# 3.0
int x = 2;
int[] numbers = { 1, 2, 3, 4 };

IEnumerable<int> hugeNumbers = numbers.Where(n => n > x);

Clearly, this is much simpler than the C# 1.0 version from last time.  However, anonymous methods and lambda expressions are compile-time features; the CLR itself is not aware of them.  How does this code work? How can an anonymous method use a local variable from its parent scope?

This is an example of a closure – a function bundled together with external variables that the function uses.  The C# compiler handles this the same way that I did manually last time in C# 1: it generates a class to hold the function and the variables that it uses, then creates a delegate from the member function in the class.  Thus, the local state is passed as the delegate’s this parameter.

To see how the C# compiler implements closures, I’ll use ILSpy to decompile the more-familiar C# 3 version: (I simplified the compiler-generated names for readability)

[CompilerGenerated]
private sealed class ClosureClass {
    public int x;
    public bool Lambda(int n) {
        return n > this.x;
    }
}
private static void Main() {
    ClosureClass closure = new ClosureClass();
    closure.x = 2;
    int[] numbers = { 1, 2, 3, 4 };
    IEnumerable<int> hugeNumbers = numbers.Where(closure.Lambda);
}

The ClosureClass (which was actually named <>c__DisplayClass1) is equivalent to the GreaterThan class from my previous example.  It holds the local variables used in the lambda expression.  Note that this class replaces the variables – in the original method, instead a local variable named x, the compiler uses the public x field from the ClosureClass.  This means that any changes to the variable affect the lambda expression as well.

The lambda expression is compiled into the Lambda method (which was originally named <Main>b__0).  It uses the same field to access the local variable, sharing state between the original outer function and its lambda expression.

Next time: Javascript

Open Delegates vs. Closed Delegates

.Net supports two kinds of delegates: Open delegates and closed delegates.

When you create a delegate that points to an instance method, the instance that you created it from is stored in the delegate’s Target property.  This property is passed as the first parameter to the method that the delegate points to.  For instance methods, this is the implicit this parameter; for static methods, it's the method's first parameter.  These are called closed delegates, because they close over the first parameter and bring it into the delegate instance.

It is also possible to create open delegates which do not pass a first parameter.  Open delegates do not use the Target property; instead, all of the target method’s parameters are passed from the delegate’s formal parameter list, including the first parameter.  Therefore, an open delegate pointing to a given method must have one parameter more than a closed delegate pointing to the same method.  Open delegates are usually used to point to static methods.  When you make a delegate pointing to a static method, you (generally) don't want the delegate to hold a first parameter for the method. 

In addition to these two normal cases, it is also possible (in .Net 2.0 and later) to create open delegates for instance methods and to create closed delegates for static methods.  With one exception, C# doesn’t have any syntactical support for these unusual delegates, so they can only be created by calling the CreateDelegate method.

Open delegates by calling the CreateDelegate overload that doesn’t take a target parameter.  Before .Net 2.0, this function could only be called with a static method.   In .Net 2.0, you can call this function with an instance method to create an open delegate.  Such a delegate will call use its first parameter as this instead of its Target field. 

As a concrete example, consider the String.ToUpperInvariant() method.  Ordinarily, this method takes no parameters, and operates on the string it’s called on.  An open delegate pointing to this instance method would take a single string parameter, and call the method on that parameter.

For example:

Func<string> closed = new Func<string>("a".ToUpperInvariant);
Func<string, string> open = (Func<string, string>)
    Delegate.CreateDelegate(
        typeof(Func<string, string>),
        typeof(string).GetMethod("ToUpperInvariant")
    );

closed();     //Returns "A"
open("abc");  //Returns "ABC"

Closed delegates are created by calling the CreateDelegate overload that takes a target parameter.  In .Net 2.0, this can be called with a static method and an instance of that method’s first argument type to create a closed delegate that calls the method with the given target as its first parameter.  Closed delegates curry the first parameter from the target method.  For example:

Func<object, bool> deepThought = (Func<object, bool>)
    Delegate.CreateDelegate(
        typeof(Func<object, bool>),
        2,
        typeof(object).GetMethod("Equals", BindingFlags.Static | BindingFlags.Public)
    );

This code curries the static Object.Equals method to create a delegate that calls Equals with 2 and the delegate’s single parameter).  It’s equivalent to x => Object.Equals(2, x).  Note that since the method is (generally) not a member of the target object’s type, we need to pass an actual MethodInfo instance; a name alone isn’t good enough.

Note that you cannot create a closed delegate from a static method whose first parameter is a value type, because, unlike all instance methods, static delegates that take value types receive their parameters by value, not as a reference.   For more details, see here and here.

C# 3 added limited syntactical support for creating closed delegates.  You can create a delegate from an extension method as if it were an instance method on the type it extends.  For example:

var allNumbers = Enumerable.Range(1, Int32.MaxValue);
Func<int, IEnumerable<int>> countTo = allNumbers.Take;

This code creates an IEnumerable<int> containing all positive integers, then creates a closed delegate that curries this sequence into the static Enumerable.Take<T>(IEnumerable<T>) method.

Except for extension methods, open instance delegates and closed static delegates are rarely used in actual code.  However, it is important to understand how ordinary open and closed delegates work, and where the target object ends up for instance delegates.

Delegates vs. Function Pointers, part 3: C# 1.0

This is part 3 in a series about state and function pointers; part 1 is here.

Last time, we saw that it is impossible to bundle context along with a function pointer in C.

In C#, it is possible to fully achieve my standard example.  In order to explain how this works behind the scenes, I will limit this post to C# 1.0 and not use a lambda expression.  This also means no LINQ, generics, or extension methods, so I will, once again, need to write the filter method myself.

delegate bool IntFilter(int num);

static ArrayList Filter(IEnumerable source, IntFilter filter) {
    ArrayList retVal = new ArrayList();

    foreach(int i in source)
        if (filter(i))
            retVal.Add(i);
            
    return retVal;
}

class GreaterThan {
    public GreaterThan(int b) {
        this.bound = b;
    }
    int bound;
    public bool Passes(int num) {
        return num > bound;
    }
}

int x = 2;
int[] numbers = { 1, 2, 3, 4 };
Filter(numbers, new IntFilter(new GreaterThan(x).Passes));

Please excuse the ugly code; in order to be true to C# 1.0, I can’t just write an iterator, and I can’t implicitly create the delegate.

This code creates a class called GreaterThan that holds the Passes method and the state used by the method.  I create a delegate to pass to Filter out of the Passes method for an instance of the GreaterThan class from the local variable.

To understand how this works, we need to delve further into delegates.  .Net delegates are more than just type-safe function pointers. Unlike the function pointers we've looked at, delegates include state – the Target property. This property stores the object to pass as the hidden this parameter to the method (It's actually somewhat more complicated than that; I will describe open and closed delegates at a later date).  My example creates a delegate instance in which the Target property points to the GreaterThan instance.  When I call the delegate, the Passes method is called on the correct instance from the Target property, so it can read the bound field.

Next time, we'll see how the C# 2.0+ compilers generate all this boilerplate for you using anonymous methods and lambda expressions.

Delegates vs. Function Pointers, part 2: C

This is part 2 in a series about state and function pointers; part 1 is here.

Unlike most other languages, it is not possible to include any form of state in a function pointer in C.  Therefore, it is impossible to fully implement closures in C without the cooperation of the call-site and/or the compiler.

To illustrate what this means in practice, I will refer to my standard example, using a filter function to find all elements in an array that are greater than x.  Since C doesn’t have any such function, I’ll write one myself, inspired by qsort:

void* filter(void* base, 
             size_t count, 
             size_t elemSize, 
             int (*predicate)(const void *)) {
    //Magic...
}

With this function, one can easily find all elements that are greater than some constant:

int predicate(const void* item) {
        return *(int*)item > 2;
}

int arr[] = { 1, 2, 3, 4 };
filter(arr, 4, sizeof(int), &predicate);

However, if we want to replace the hard-coded 2 with a local variable, it becomes more complicated.  Since function pointers, unlike delegates, do not have state, there is no simple way to pass the local variable to the function.

The most obvious answer is to use a global variable:

static int minimum;
int predicate(const void* item) {
        return *(int*)item > minimum;
}

int arr[] = { 1, 2, 3, 4 };
minimum = x;
filter(arr, 4, sizeof(int), &predicate);

However, this code will fail horribly if it’s ever run on multiple threads.  In simple cases, you can work around that by storing the value in a global thread-local variable (or in fiber-local storage for code that uses fibers).   Because each thread will have its own global, the threads won’t conflict with each-other.  Obviously, this won’t work if the callback might be run on a different thread.

However, even if everything is running on one thread, this still isn’t enough if the code can be re-entrant, or if the callback might be called after the original function call finishes.  This technique assumes that exactly one callback (from a single call to the function that accepted the callback) will be used at any given time.  If the callback might be invoked later (eg, a timer, or a UI handler), this method will break down, since all of the callbacks will share the same global.

In these more complicated cases, one can only pass any state with the cooperation of the original function.  In this (overly simple) example, the filter method can be modified to take a context parameter and pass it to the predicate:

void* filter(void* base, 
             size_t count,
             size_t elemSize, 
             int (*predicate)(const void *, const void *), 
             const void* context) {
        //More magic...
}
int predicate(const void* item, const void* context) {
        return *(int*)item > *(int*)context;
}

int arr[] = { 1, 2, 3, 4 };
filter(arr, 4, sizeof(int), &predicate, &x);

To pass around more complicated state, one could pass a pointer to a struct as the context.

Next Time: C# 1.0

Delegates vs. Function Pointers, part 1

Most languages – with the unfortunate exception of Java – allow functions to be passed around as variables.  C has function pointers, .Net has delegates, and Javascript and most functional programming languages treat functions as first class objects.

There is a fundamental difference between C-style function pointers vs. delegates or function objects.  Pure function pointers cannot hold any state other than the function itself.  In contrast, delegates and function objects do store additional state that the function can use.

To illustrate this difference,  I will use a simple example.  Most programming environments have a filter function that takes a collection and a predicate (a function that takes an item from the collection and returns a boolean), and returns a new collection with those items from the original collection that match the predicate.  In C#, this is the LINQ Where method; in Javascript, it’s the Array.filter method (introduced by Javascript 1.6).

Given a list of numbers, you can use such a method to find all numbers in the list that are greater than a local variable x.  However, in order to do that, the predicate have access to x.  In subsequent posts, I’ll explore how this can be done in different languages.

Html.ForEach in Razor

Many people write ForEach extension methods for MVC WebForms views, which take a sequence and a delegate to turn each item in the sequence into HTML.

For example:

public static void ForEach<T>(this HtmlHelper html,
                              IEnumerable<T> set, 
                              Action<T> htmlWriter) {
    foreach (var item in set) {
        htmlWriter(item);
    }
}

(The unused html parameter allows it to be called as an extension method like other HTML helpers)

This code can be called like this:

<ul>
    <% Html.ForEach(
            Enumerable.Range(1, 10),
            item => { %> <li><%= item %></li> <% }
    ); %>
</ul>

This code creates a lambda expression that writes markup to the page’s response stream by ending the code block inside the lambda body. Neither the lambda expression nor the ForEach helper itself return anything; they both write directly to the response.

The List<T>.ForEach method can be called exactly the same way.

In Razor views, this method cannot easily be called directly, since Razor pages cannot put markup inside of expressions.  You can use a workaround by creating an inline helper and calling it immediately, but it would be better to rewrite the ForEach method to take an inline helper directly.

The naïve way to do that is like this:

public static IHtmlString ForEachSimple<T>(
        this HtmlHelper html,
        IEnumerable<T> set,
        Func<T, HelperResult> htmlCreator
    ) {
    return new HtmlString(String.Concat(set.Select(htmlCreator)));
}

The htmlCreator delegate, which can be passed as an inline helper, returns a HelperResult object containing the markup generated for an item.

This code uses LINQ to call htmlCreator on each item in the set (the Select call), then calls String.Concat to combine them all into one giant string.  (String.Concat will call ToString on each HelperResult, which will return the generated markup)  We could also call String.Join to put a separator, such as a newline, between every two items.

Finally, it returns an HtmlString to prevent Razor from escaping the returned HTML.

It’s equivalent to the following code using a StringBuilder (this is what String.Concat does internally)

var builder = new StringBuilder();
foreach (var item in set) {
    HelperResult result = htmlCreator(item);
    builder.Append(result.ToString());
}
return new HtmlString(builder.ToString());

This method can be called like this:

<ul>
    @Html.ForEachSimple(
        Enumerable.Range(1, 10),
        @<li>@item</li>
    );
</ul>

The problem with this approach is that it combines all of the content into a giant string.  If there are a large number of items, or if each item will have a large amount of markup, this can become (a little bit) slow.  It would be better to write each item directly to the caller’s response stream, without assembling any giant strings.  This is where HelperResult comes in.

The HelperResult class allows its caller to pass a TextWriter to the WriteTo method, and the helper delegate will write directly to this TextWriter.  I can take advantage of this to write a ForEach extension that doesn’t build any strings, by returning a HelperResult instead of a regular IHtmlString.

public static HelperResult ForEachFast<T>(
        this HtmlHelper html,
        IEnumerable<T> set,
        Func<T, HelperResult> htmlCreator
    ) {
    return new HelperResult(tw => {
        foreach (var item in set) {
            htmlCreator(item).WriteTo(tw);
        }
    });
}
This version creates a HelperResult with a delegate that writes each of its items in turn to the TextWriter.

Creating Markup Actions in Razor

Razor’s inline helpers allow you to create lambda expression that return markup (as a HelperResult).  However, there is no simple way to create a lambda expression that writes HTML directly to the page (instead of returning it).

In ASPX pages, one can simply put the beginning and end of a lambda expression in expression blocks, then put markup in the middle.

For example, this code creates a delegate that writes a <b> tag to the page:

<%
    Action pageWriter = () => {%><b>I'm from a lambda!</b><%};
    pageWriter();
    pageWriter();
    pageWriter();
%>

Calling the pageWriter delegate will write directly to the HTTP response stream.

By contrast, Razor inline expressions return their markup.  To do this in a Razor page, one would write

@{
    Func<object, HelperResult> htmlMaker
         = @<b>I'm from a lambda!</b>;
    @htmlMaker(null)    //Note @ sign
    @htmlMaker(null)    //Note @ sign
    @htmlMaker(null)    //Note @ sign
}

Calling htmlMaker without an @ sign will return HTML, but won’t write anything to the page.

When working with libraries designed for ASPX pages, it can be necessary to create ASPX-style inline helpers that write to the page instead of returning a value.  You can do that by creating an inline helper lambda and passing it to the Write method:

@{
    Action pageWriter = () => Write(new Func<object, HelperResult>(
         @<b>I'm from a lambda!</b>
    )(null));

    pageWriter();
    pageWriter();
    pageWriter();
}

Like the ASPX version, pageWriter now writes directly to the page and does not return anything.

This code can be made simpler by wrapping it in a separate method:

Action MakeAction(Func<object, HelperResult> inlineHelper) {
    return () => Write(inlineHelper(null));
}

This method takes an inline helper and returns an Action that writes the helper’s output to the page.  Since it needs to call the page’s Write method (to use the page’s output stream), this method must be defined in the WebPageBase instance, either in an @functions block or in a common base class.

Any code in the inline helper will execute each time the resulting Action is called.

It can be called like this:

@{
    Action pageWriter2 = MakeAction(@<b>I'm from a lambda!</b>);

    pageWriter();
    pageWriter();
    pageWriter();
}

This code is equivalent to the previous sample, but it’s much simpler.

One can also write write a version that takes a parameter:

Action<T> MakeAction<T>(Func<T, HelperResult> inlineHelper) {
    return param => Write(inlineHelper(param));
}
Note that the type parameter must be specified explicitly; C# does not infer type parameters from the method’s return type.

Dissecting Razor, part 9: Inline Helpers

In addition to normal and static helpers, Razor supports inline helpers, (also known as templates), which allow you to create lambda expressions that return markup.

An inline helper is created by writing @<tag>Content</tag> as an expression.  This creates a lambda expression that takes a parameter called item and returns a HelperResult object containing the markup.

Inline helpers are used to create functions that take markup as parameters.  For example, you might make an IfLoggedOn helper that displays content if there is a logged-in user, but shows a login link to anonymous users.  To pass the content to the helper, you can use an inline helper:

@helper IfLoggedOn(Func<MembershipUser, HelperResult> content) {
    var currentUser = Membership.GetUser();
    if (currentUser == null) {
        <i>
            To use this content, you must 
            <a href="@Href("~/Login")">log in</a>.
        </i>
    } else {
        @content(currentUser)
    }
}

You can call this method with an inline helper like this:

@IfLoggedOn(
    @<form action="@Href("Add-Comment")" method="post">
        @Html.TextArea("Comment", "Type a comment")
        <input type="submit" name="Submit" value="Add Comment" />
    </form>
)

The @content(currentUser) call in helper method is translated by the Razor compiler into a call to the overload of the Write method that takes a HelperResult (returned from the delegate).  This overload writes the content of the HelperResult to the page without escaping it (Just like a call to a normal helper method).  Functions that take inline helpers can also get the text rendered by the helper by calling ToString() on the HelperResult.

The call to the helper method is compiled to the following C# code:

Write(IfLoggedOn(
item => new System.Web.WebPages.HelperResult(__razor_template_writer => {

    WriteLiteralTo(@__razor_template_writer, "    ");
    WriteLiteralTo(@__razor_template_writer, "<form action=\"");

    WriteTo(@__razor_template_writer, Href("Add-Comment"));

    WriteLiteralTo(@__razor_template_writer, "\" method=\"post\">\r\n        ");

    WriteTo(@__razor_template_writer, Html.TextArea("Comment", "Type a comment"));

    WriteLiteralTo(@__razor_template_writer, "\r\n   ...  </form>\r\n");

})));

The IsLoggedOn method is passed a lambda expression that takes an item parameter and returns a new HelperResult.  The HelperResult is constructed exactly like a normal helper, except that it’s in a lambda expression instead of a normal method.

Like normal helpers, then markup inside inline helpers can contain arbitrary code blocks and code nuggets.  However, inline helpers cannot be nested (since that would create conflicting parameter names)

The item parameter is implicitly typed based on the delegate type that the helper is being passed as (just like normal lambda expressions).  This allows inline helpers to accept a parameter:

@IfLoggedOn(@<text>Welcome, @item.UserName!</text>)

(The special <text> tag is used to pass markup without creating an actual HTML tag)

Next Time: Sections

Dissecting Razor, part 8: Static Helpers

Razor helpers can be extremely useful, but they are designed to be used only by the page that created them.

To create reusable helpers, you can create CSHTML pages in the App_Code directory.  The WebRazorHostFactory will check for pages in App_Code and create a WebCodeRazorHost instead of the normal WebPageRazorHost.

This happens before the virtual CreateHost method; in order to change this behavior, one must create an inherited RazorBuildProvider and override the CreateHost method; for more details, see the source for more details.

The WebCodeRazorHost compiles helper methods as public static methods by setting the RazorEngineHost.StaticHelpers property to true.  It also overrides PostProcessGeneratedCode to remove the Execute() method and make the Application property static. 

Static helper pages inherit the HelperPage class, a very stripped-down base class which contains the static WriteTo methods used by the helpers and some static members that expose the Model, Html, and other members from the currently executing page (using the WebPageContext.Current.Page property).

Because the generated Execute() method is removed, all content outside helpers and @functions  blocks is not seen by the compiler.  It is seen by the Razor parser, so it must contain valid Razor syntax (eg, no stray @ characters).

Here is an example:

@helper Menu(params string[][] items) {
    <ul>
        @foreach (var pair in items) {
            <li><a href="@Href(pair[1])">@pair[0]</a></li>
        }
    </ul>
}

This text and @code is not compiled.

This helper can then be called from any other code by writing PageName.Menu(...), where PageName is the filename of the CHSTML page in App_Code.  In Razor pages, it can be called like any other helper.  In normal C# code, it returns a HelperResult instance; call ToString() or WriteTo(TextWriter) to get the HTML source.

For example:  (assuming that the helper is defined in ~/App_Code/Helpers.cshtml)

@Helpers.Menu(
    new[] { "Home",    "~/"        },
    new[] { "About",   "~/About"   },
    new[] { "Contact", "~/Contact" }
)

In order to see the source, I need to insert a #error directive inside the helper block; putting it in the page itself will have no effect, since the page is not compiled.  Since the contents of the helper block are code, not markup, I don’t need to wrap it in @{ ... }.

The above helper is transformed into the following C#: (As usual, @line directives have been removed)

public class Helpers : System.Web.WebPages.HelperPage {
    public static System.Web.WebPages.HelperResult Menu(params string[][] items) {
        return new System.Web.WebPages.HelperResult(__razor_helper_writer => {

            WriteLiteralTo(@__razor_helper_writer, "    <ul>\r\n");
            foreach (var pair in items) {
                WriteLiteralTo(@__razor_helper_writer, "            <li><a href=\"");
                WriteTo(@__razor_helper_writer, Href(pair[1]));
                WriteLiteralTo(@__razor_helper_writer, "\">");
                WriteTo(@__razor_helper_writer, pair[0]);
                WriteLiteralTo(@__razor_helper_writer, "</a></li>\r\n");
            }
            WriteLiteralTo(@__razor_helper_writer, "    </ul>\r\n");
#error
        });
    }
    public Helpers() {
    }
    protected static System.Web.HttpApplication ApplicationInstance {
        get {
            return ((System.Web.HttpApplication)(Context.ApplicationInstance));
        }
    }
}

Note that the page-level content does not show up anywhere.  The helper itself is compiled exactly like a normal helper, except that it’s static.

Although there aren’t any in this example, @functions blocks will also be emitted normally into the generated source.

Next Time: Inline Helpers

Boot Camp just got better

The Boot Camp drivers in the latest generation of MacBook Pros now expose more of the laptop’s hardware to Windows.

This means that Windows can now adjust the screen brightness in the Windows Mobility Center, and when you plug in or unplug the laptop. 

image

The new drivers also expose the laptop’s ambient light sensor via Windows 7’s sensor platform, allowing the screen brightness to be adjusted automatically, and allowing 3rd-party programs to read the brightness.

Dissecting Razor, part 7: Helpers

We’ll continue our trek into Razor’s class-level features with helpers.

Helpers are one of Razor’s unique features.  They encapsulate blocks of HTML and server-side logic into reusable page-level methods. 

You can define a helper by writing @helper MethodName(parameters) { ... }.  Inside the code block, you can put any markup or server-side code.  The contents of a helper are parsed as a code block (like the contents of a loop or if block), so any non-HTML-like markup must be surrounded by <text> tags or prefixed by the @: escape.

Here is a simple example:

<!DOCTYPE html>
<html>
    <body>
        @helper NumberRow(int num) {
            var square = num * num;
            <tr>
                <td>@num</td>
                <td>@square</td>
                <td>@(num % 2 == 0 ? "Even" : "Odd")</td>
            </tr>
        }
        <table>
            <thead>
                <tr>
                    <th>Number</th>
                    <th>Square</th>
                    <th>Eveness</th>
                </tr>
            </thead>
            <tbody>
                @for (int i = 0; i < 10; i++) {
                    @NumberRow(i)
                }
            </tbody>
        </table>
    </body>
</html>

Note that code statements (such as the square declaration) can go directly inside the helper body without being wrapped in code blocks – the direct contents of the helper is a code block, not markup.  Like any other code block, HTML-like markup is automatically treated as markup instead of code.

Like @functions blocks, helper methods can go anywhere in the source file; the physical location of the block is ignored.

Here is the generated source for the above example: (with blank lines and #line directives stripped for clarity)

public class _Page_Razor_Helpers_cshtml : System.Web.WebPages.WebPage {

    public System.Web.WebPages.HelperResult NumberRow(int num) {
        return new System.Web.WebPages.HelperResult(__razor_helper_writer => {

            var square = num * num;

            WriteLiteralTo(@__razor_helper_writer, "            <tr>\r\n                <td>");
            WriteTo(@__razor_helper_writer, num);

            WriteLiteralTo(@__razor_helper_writer, "</td>\r\n                <td>");
            WriteTo(@__razor_helper_writer, square);

            WriteLiteralTo(@__razor_helper_writer, "</td>\r\n                <td>");
            WriteTo(@__razor_helper_writer, num % 2 == 0 ? "Even" : "Odd");

            WriteLiteralTo(@__razor_helper_writer, "</td>\r\n            </tr>\r\n");

        });
    }
    public _Page_Razor_Helpers_cshtml() {
    }
    protected ASP.global_asax ApplicationInstance {
        get {
            return ((ASP.global_asax)(Context.ApplicationInstance));
        }
    }
    public override void Execute() {
        WriteLiteral("<!DOCTYPE html>\r\n<html>\r\n    <body>\r\n        ");
        WriteLiteral("\r\n        <table>\r\n            <thead>\r\n                <tr>\r\n                   " +
        " <th>Number</th>\r\n                    <th>Square</th>\r\n                    <th>E" +
        "veness</th>\r\n                </tr>\r\n            </thead>\r\n            <tbody>\r\n");

        for (int i = 0; i < 10; i++) {
            Write(NumberRow(i));
        }

        WriteLiteral("            </tbody>\r\n        </table>\r\n    </body>\r\n</html>\r\n");
#error

    }
}

Helpers are compiled as class-level methods that take a parameter set and return a System.Web.WebPages.HelperResult.  (This class name is configured by the RazorHostFactory)

Notice the the contents of the helper method are inside a lambda expression that takes a parameter named __razor_helper_writer.  This construction allows the helper to write directly to the HTTP response stream instead of assembling a giant string and then writing the string all at once.

The HelperResult constructor takes an Action<TextWriter> which contains the contents of the helper block.  The class implements IHtmlString and calls the action from the constructor to generate HTML.  However, under normal circumstances, this IHtmlString implementation is never called.

Calls to helper methods (@NumberRow(i)) are passed to the Write(HelperResult) overload.  This overload calls HelperResult.WriteTo(writer), which passes the page’s TextWriter directly to the helper’s lambda expression.  Thus, the lambda expression can write directly to the page’s output stream, without passing the output as a parameter to the helper method.

Looking inside the helper, we see that all content is passed to WriteTo and WriteLiteralTo methods, as opposed to the Write and WriteLiteral methods used by the rest of the page.

Helper methods cannot call the normal Write* methods since they aren’t necessarily writing to the current output (even though they usually do).  Therefore, they call these Write*To methods, which accept the TextWriter as a parameter.  These static methods are inherited from the WebPageExecutingBase class; their names are also configured by the RazorHostFactory.  The @ in the parameter is a little-used C# syntactictal feature that allows keywords to be used as identifiers; it has nothing to do with Razor’s use of the @ character.

Since helpers are compiled as normal methods, they can do almost anything that a normal method can.  However, because their contents are compiled inside a lambda expression, they have some limitations.  For example, helpers cannot use ref or out parameters, since they cannot be used inside lambda expressions.  Helpers can take params arrays or optional parameters.

Also, Razor’s C# code parser doesn’t support generic helper methods, although there is no reason that they couldn’t be supported in a later release.

The VB.Net code parser also doesn’t explicitly support generic helpers.  However, because VB.Net generics use parentheses, a VB.Net helper can declare a generic type parameter instead of a normal parameter list.

For example:

<!DOCTYPE html>
<html>

    <body>
        @Helper PrintType(Of T)
            @GetType(T)
        End Helper
        @PrintType(Of Dictionary(Of Integer, List(Of String)))()
    </body>

</html>

This trick is not very useful.

Next Time: Static Helpers