Showing posts with label C#. Show all posts
Showing posts with label C#. Show all posts

Exploring Caller Info Attributes

Last year, Microsoft announced a simple new feature in C# 5: Caller Info Attributes.  These attributes let you to create methods with optional parameters and tell the compiler to pass the caller’s filepath, line number, or member name instead of the parameter’s default value.  This allows you to create logging methods that automatically know where they’re being called.

When the feature was announced, I wrote a couple of blog posts that delved into some of the corner cases of the new feature.  At the time, there was no public implementation, so they were pure conjecture.

This morning, Microsoft released the beta of Visual Studio 11, which is the first public build supporting these attributes.  Now, I can finally test my theories.  Here are the results:

Although these classes are new to the .Net Framework 4.5, you can still use this feature against older framework versions by creating your own classes in the System.Runtime.CompilerServices namespace.  However, the feature will only work if the code calling the method is compiled with the C# 5 compiler; older compilers will ignore the attributes and simply pass the parameters’ default values.

All of the attributes can only be applied to arguments of types that have standard (not custom) implicit conversions to int or string.  This means that it isn’t practical to overflow [CallerLineNumber] (the compiler ran out of memory first), so I can’t test how that behaves.

Using [CallerMemberName] on field initializers passes the field name, and on static or instances constructors passes the string ".cctor" or ".ctor" (as documented)  In indexers, it passes "Item".

If a class has a constructor that takes only caller info attribute parameters, and you create another class that inherits it and does not declare a constructor (thus implicitly passing optional parameters), it passes the line number and file name of the class keyword in the derived class, but leaves the declared default for the member name (I suspect that’s a bug). 

If you do declare a constructor, it passes the string ".ctor" as the member name for the implicit base() call (just like a normal method call from inside a constructor) and the line number of the beginning of the constructor declaration.  If you actually write a base() call, it passes the line number of the base keyword.

If a call spans multiple lines, [CallerLineNumber] passes the line containing the openning parenthesis.

Delegates are fully supported; if you call a delegate that has an argumented annotated with a caller info attribute, the compiler will insert the correct value, regardless of the method you’re actually calling (which the compiler doesn’t even know).

LINQ query comprehension syntax is not supported at all; if you create a (for example) Select() method that contains a caller info attribute, then call it from a LINQ query (not lambda syntax), the compiler will crash (!).  (they will fix that)

Expression trees do not support optional parameters at all, so that corner case is irrelevant.

Attributes are the most interesting story.  What should happen if you declare a custom attribute that takes parameters with caller info attributes, then apply that attribute in various cases?  This could potentially be very useful, since there is currently no way for an attribute to know what it’s being applied to. (I hadn’t thought of this usage when I wrote the original blog post)

The documentation says that this will work in all cases, and that [CallerMemberName] will pass whatever the attribute is being applied to.  However, in the beta build, this doesn’t always work.

Attributes applied to method arguments or return values do not pass any caller info at all.  Attributes applied to types or generic type arguments do not pass member names (this is very disappointing)

Hopefully, those will be fixed before release.

Protecting against CSRF attacks in ASP.Net MVC

CSRF attacks are one of the many security issues that web developers must defend against.  Fortunately, ASP.Net MVC makes it easy to defend against CSRF attacks.  Simply slap on [ValidateAntiForgeryToken] to every POST action and include @Html.AntiForgeryToken() in every form, and your forms will be secure against CSRF.

However, it is easy to forget to apply [ValidateAntiForgeryToken] to every action.  To prevent such mistakes, you can create a unit test that loops through all of your controller actions and makes sure that every [HttpPost] action also has [ValidateAntiForgeryToken]. 

Since there may be some POST actions that should not be protected against CSRF, you’ll probably also want a marker attribute to tell the test to ignore some actions.

This can be implemented like this:

First, define the marker attribute in the MVC web project.  This attribute can be applied to a single action, or to a controller to allow every action in the controller.

///<summary>Indicates that an action or controller deliberately 
/// allows CSRF attacks.</summary>
///<remarks>All [HttpPost] actions must have 
/// [ValidateAntiForgeryToken]; any deliberately unprotected 
/// actions must be marked with this attribute.
/// This rule is enforced by a unit test.</remarks>
[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method)]
public sealed class AllowCsrfAttacksAttribute : Attribute { }

Then, add the following unit test:

[TestMethod]
public void CheckForCsrfProtection() {
    var controllers = typeof(MvcApplication).Assembly.GetTypes().Where(typeof(IController).IsAssignableFrom);
    foreach (var type in controllers.Where(t => !t.IsDefined(typeof(AllowCsrfAttacksAttribute), true))) {
        var postActions = type.GetMethods()
                                .Where(m => !m.ContainsGenericParameters)
                                .Where(m => !m.IsDefined(typeof(ChildActionOnlyAttribute), true))
                                .Where(m => !m.IsDefined(typeof(NonActionAttribute), true))
                                .Where(m => !m.GetParameters().Any(p => p.IsOut || p.ParameterType.IsByRef))
                                .Where(m => m.IsDefined(typeof(HttpPostAttribute), true));

        foreach (var action in postActions) {
            //CSRF XOR AntiForgery
            Assert.IsTrue(action.IsDefined(typeof(AllowCsrfAttacksAttribute), true) != action.IsDefined(typeof(ValidateAntiForgeryTokenAttribute), true),
                            action.Name + " is [HttpPost] but not [ValidateAntiForgeryToken]");
        }
    }
}
typeof(MvcApplication) must be any type in the assembly that contains your controllers.  If your controllers are defined in multiple assemblies, you’ll need to include those assemblies too.

The Dark Side of Covariance

What’s wrong with the following code?

var names = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
...
if (names.Contains(sqlCommand.ExecuteScalar())

This  code is intended to check whether the result of a SQL query is contained in a case-insensitive collection of names.  However, if you run this code, the resulting check will be case-sensitive.  Why?

As you may have guessed from the title, this is caused by covariance.  In fact, this code will not compile at all against .Net 3.5. 

The problem is that ExecuteScalar() returns object, not string.  Therefore, it doesn’t call HashSet<string>.Contains(string), which is what it’s intending to call (and which uses the HashSet’s comparer).  Instead, on .Net 4.0, this calls the  Enumerable.Contains<object>(IEnumerable<object>, string) extension method, using the covariant conversion from IEnumerable<string> to IEnumerable<object>.  Covariance allows us to pass object to the Contains method of any strongly-typed collection (of reference types).

Still, why is it case-sensitive?  As Jon Skeet points out, the LINQ Contains() method is supposed to call any built-in Contains() method from ICollection<T>, so it should still use the HashSet’s case-insensitive Contains().

The reason is that although HashSet<String> implements ICollection<string>, it does not implement ICollection<object>.  Since we’re calling Enumerable.Contains<object>, it checks whether the sequence implements ICollection<object>, which it doesn’t.  (ICollection<T> is not covariant, since it allows write access)

Fortunately, there’s a simple fix: just cast the return value back to string (and add a comment explaining the problem).  This allows the compiler to call HashSet<string>.Contains(string), as was originally intended.

//Call HashSet<string>.Contains(string), not the
//covariant Enumerable.Contains(IEnumerable<object>, object)
//http://blog.slaks.net/2011/12/dark-side-of-covariance.html
if (names.Contains((string)sqlCommand.ExecuteScalar())
(I discovered this issue in my StringListConstraint for ASP.Net MVC)

Subtleties of C# 5’s new [CallerLineNumber]

UPDATE: Now that the Visual Studio 11 beta has shipped with this feature implemented, I wrote a separate blog post exploring how it actually behaves in these corner cases.

This is part 2 in a series about C# 5’s new caller info attributes; see the introduction.

The [CallerLineNumber] attribute tells the compiler to use the line number of the call site instead of the parameter’s default value.  This attribute has more corner cases than [CallerFileName].  In particular, unlike the C preprocessor’s __LINE__ macro, the C# compiler inserts the line number of a parsed method call.  Therefore, it is not always clear which line a method call expression maps to.

What should this call print:

static class Utils {
    int GetLine([CallerLineNumber] int line = 0) {
        return line;
    }
}

Console.WriteLine(
    Utils
        .
        GetLine
        (
        )
    );

Should it print the line number that the statement started? The line in which the call to GetLine started? The line containing the parentheses for GetLine? What if it’s in a multi-line lambda expression?

There are also a few cases in which methods are called implicitly by the compiler without appearing in source code.  What should this code print?

class Funny {
    public Funny Select(Func<object, object> d,
                        [CallerLineNumber]int line = 0) {
        Console.WriteLine(line + ": " + d(d));
        return this;
    }
}

var r = (from x in new Funny() 
         let y = 1 
         select 2);

This code contains two implicit calls to the Select method that don’t have a clear source line (it gets worse for more complicated LINQ queries)

In fact, it is possible to have an implicit method call with no corresponding source code at all.

Consider this code:

class Loggable {
    public Loggable([CallerLineNumber] int line = 0) { }
}
class SomeClass : Loggable { }

The compiler will implicitly generate a constructor for SomeClass that calls the base Loggable constructor with its default parameter value.  What line should it pass? In fact, if SomeClass is a partial class that is defined in multiple files, it isn’t even clear what [CallerFileName] should pass.

Also, what should happen in the unlikely case that a [CallerLineNumber] method is called on line 3 billion (which would overflow an int)? (This would be easier to test on Roslyn with a fake stream source) Should it give an integer overflow compile-time error?  If [CallerLineNumber] also supports byte and short parameters, this scenario will be more likely to happen in practice.

Next time: [CallerMemberName]

Using a default controller in ASP.Net MVC

One common question about ASP.Net MVC is how to make “default” controller.

Most websites will have a Home controller with actions like About, FAQ, Privacy, or similar pages.  Ordinarily, these actions can only be accessed through URLs like ~/Home/About.  Most people would prefer to put these URLs directly off the root: ~/About, etc.

Unfortunately, there is no obvious way to do that in ASP.Net MVC without making a separate route or controller for each action.

You cannot simply create a route matching "/{action}" and map it to the Home controller, since such a route would match any URL with exactly one term, including URLs meant for other controllers.  Since the routing engine is not aware of MVC actions, it doesn’t know that this route should only match actions that actually exist on the controller.

To make it work, we can add a custom route constraint that forces this route to only match URLs that correspond to actual methods on the controller.

To this end, I wrote an extension method that scans a controller for all action methods and adds a route that matches actions in that controller. The code is available at gist.github.com/1225676.  It can be used like this:

routes.MapDefaultController<Controllers.HomeController>();

This maps the route "/{action}/{id}" (with id optional) to all actions defined in HomeController.   Note that this code ignores custom ActionNameSelectorAttributes (The built-in [ActionName(…)] is supported).

For additional flexibility, you can also create custom routes that will only match actions in a specific controller.  This is useful if you have a single controller with a number of actions that has special route requirements that differ from the rest of your site.

For example:

routes.MapControllerActions<UsersController>(
    name: "User routes",
    url:  "{userName}/{action}"
    defaults: new { action = "Index" }
);

(Note that this example will also match URLs intended for other controllers with the same actions; plan your routes carefully)

Clarifying Boolean Parameters, part 2

Part 1 is here

Some languages have better ways to pass boolean parameters.  C# 4.0, and all versions of VB, allow parameters to be passed by name.  This allows us to write much clearer code:

//C# 4.0:
UpdateLayout(doFullLayout: false) 
'VB.Net:
UpdateLayout(doFullLayout:=False) 

Without requiring any changes to the function definition, this makes the meaning of the true / false abundantly clear at the call-site.

Javascript offers another interesting alternative.  In Javascript, booleans conditions actually check for “truthyness”.  The statement if(x) will trigger  not just if x is true, but also if x is any “truthy” value, including any object, non-empty string, or non-zero number. Similarly, the expression !x will return false if x is “truthy” and true if x “falsy”.

This means that we can actually use any non-empty string instead of true in Javascript.  Note that this will only work if the function checks the value for “truthyness”; it won’t work for code like if (x === true).

Thus, instead of passing true as a boolean, you can pass a string that describes what you’re actually indicating.

For example:

function updatePosition(animate) {
    //Calculate position
    if (animate)
        //...
    else
        //...
}

$(window).resize(function() {
    updatePosition();
});

updatePosition("With animation");

Although this results in much more readable code, it can be difficult to understand for people who aren’t familiar with this trick.  If the meaning of the parameter changes, you’ll need to hunt down every place that the function is called and change the string to reflect the new meaning.

Finally, unlike an enum, this does not scale to multiple options.  If you need to have more than two options, you should use global variables or objects to simulate an enum, not strings.

Clarifying Boolean Parameters, part 1

Have you ever written code like this:

public void UpdateLayout(bool doFullLayout) {
    //Code
    if (doFullLayout) {
        //Expensive code
    }
    //More code
}

This pattern is commonly used when some operation has a “cheap” mode and an “expensive” mode.  Other code will have calls like UpdateLayout(false) and UpdateLayout(true) scattered throughout.

The problem is that this isn’t very obvious for people who aren’t familiar with the codebase.  If you take a look at a file you’ve never seen before and see calls like UpdateLayout(false) and UpdateLayout(true) scattered, you’ll have no idea what the true / false means.

The simplest solution is to break it out into two methods: UpdateComplexLayout() and UpdateBasicLayout().  However,  if the two different layout modes have intertwined code paths (eg, the code before and after the if above), this either won’t be possible or will lead to ugly duplication of code.

One alternative is to use enums:

public enum LayoutUpdateType {
    Basic,
    Full
}

public void UpdateLayout(LayoutUpdateType type) {
    //Code
    if (type == LayoutUpdateType.Full) {
        //Expensive code
    }
    //More code
}

This way, the callsites are much more descriptive: UpdateLayout(LayoutUpdateType.Full).  This also makes it easy to add more update modes in the future should the need arise.  However, it makes the callsites much more verbose.  When used frequently, this pattern can lead a vast proliferation of enum types that are each only used by one method, polluting the namespace and making more important enums harder to notice.

Next time: Cleverer alternatives

C# is not type-safe

C# is usually touted as a type-safe language.  However, it is not actually fully type-safe!

To examine this claim, we must first provide a strict definition of type-safety  Wikipedia says:

In computer science, type safety is the extent to which a programming language discourages or prevents type errors. A type error is erroneous or undesirable program behavior caused by a discrepancy between differing data types.

To translate this to C#, full type-safety means that any expression that compiles is guaranteed to work at runtime, without causing any invalid cast errors.

Obviously, the cast (and as) operator is an escape hatch from type safety.  It tells the compiler that “I expect this value to actually be of this type, even though you can’t prove it.  If I’m wrong, I’ll live with that”.  Therefore, to be fully type-safe, it must be impossible to get an InvalidCastException at runtime in C# code that does not contain an explicit cast.

Note that parsing or conversion errors (such as any exception from the Convert class) don’t count.  Parsing errors aren’t actually invalid cast errors (instead, they come from unexpected strings), and conversion errors from from cast operations inside the Convert class.  Also, null reference exceptions aren’t cast errors. 

So, why isn’t C# type-safe?

MSDN says that InvalidCastException is thrown in two conditions:

  • For a conversion from a Single or a Double to a Decimal, the source value is infinity, Not-a-Number (NaN), or too large to be represented as the destination type.

  • A failure occurs during an explicit reference conversion.

Both of these conditions can only occur from a cast operation, so it looks like C# is in fact type safe.

Or is it?

IEnumerable numbers = new int[] { 1, 2, 3 };

foreach(string x in numbers) 
    ;

This code compiles (!). Running it results in

InvalidCastException: Unable to cast object of type 'System.Int32' to type 'System.String'.

On the foreach line.

Since we don’t have any explicit cast operations (The implicit conversion from int[] to IEnumerable is an implicit conversion, which is guaranteed to succeed) , this proves that C# is not type-safe.

What happened?

The foreach construct comes from C# 1.0, before generics existed.  It worked with untyped collections such as ArrayList or IEnumerable.  Therefore, the IEnumerator.Current property that gets assigned to the loop variable would usually be of type object.   (In fact, the foreach statement is duck-typed to allow the enumerator to provide a typed Current property, particularly to avoid boxing). 

Therefore, you would expect that almost all (non-generic) foreach loops would need to have the loop variable declared as object, since that’s the compile-time type of the items in the collection.  Since that would be extremely annoying, the compiler allows you to use any type you want, and will implicitly cast the Current values to the type you declared.  Thus, mis-declaring the type results in an InvalidCastException.

Note that if the foreach type isn’t compatible at all with the type of the Current property, you will get a compile-time error (just like (string)42 doesn’t compile).  Therefore, if you stick with generic collections, you’re won’t get these runtime errors (unless you declare the foreach as a subtype of the item type).

C# also isn’t type-safe because of array covariance.

string[] strings = new string[1];
object[] arr = strings;
arr[0] = 7;

This code compiles, but throws “ArrayTypeMismatchException: Attempted to access an element as a type incompatible with the array.” at run-time.

As Eric Lippert explains, this feature was added in order to be more compatible with Java.

Delegates vs. Function Pointers, Addendum: Multicast Delegates

Until now, I've been focusing on only one of the differences between delegates and function pointers; namely, associated state.
Delegates have one other capability that function pointers do not.  A single function pointer can only point to one function.  .Net, on the other hand, supports multicast delegates – delegates that point to multiple functions.  You can combine two existing delegates using the + operator (or by calling Delegate.Combine) to create a single new delegate instance that points two all of the methods in the original two delegates.  This new delegate stores all of the methods from the original two delegates in a private array of delegates called InvocationList (the delegates in this array are ordinary non-multicast delegates that each only point to a single method). 

Note that delegates, like strings, are immutable.  Adding two delegates together creates a third delegate containing the methods from the first two; the original delegate instances are not affected.  For example, writing delegateField += SomeMethod creates a new delegate instance containing the methods originally in delegateField as well as SomeMethod, then stores this new instance in delegateField.

Similarly, the - operator (or Delegate.Remove) will remove the second operand from the first one (again, returning a new delegate instance).  If the second operand has multiple methods, all of them will be removed from the final delegate.  If some of the methods in the second operand appear multiple times in the original delegate, only the last occurrence of each one will be removed (the one most recently added).  The RemoveAll method will remove all occurrences.  If all of the methods were removed, it will return null; there is no such thing as an empty delegate instance.

Multicast delegates are not intended to be used with delegates that return values.  If you call a non-void delegate that contains multiple methods, it will return the return value of the last method in the delegate.  If you want to see the return values of all of the methods, you’ll need to loop over GetInvocationList() and call each delegate individually.

Multicast delegates also don’t play well with the new covariant and contravariant generic delegates in .Net 4.0.  You cannot combine two delegates unless their types match exactly, including variant generic parameters.

Function pointers cannot easily be combined the way multicast delegates can.  The only way to combine function pointers without cooperation from the code that calls the pointer is to make a function that uses a closure to call all of the function pointers you want to call.

In Javascript, that would look like this:

function combine() {
    var methods = arguments;

    return function() { 
        var retVal;
        for(var i = 0; i < methods.length; i++) 
            retVal = methods[i].apply(this, arguments);
        return retVal;
    };
}

Tracking Event Handler Registrations

When working with large .Net applications, it can be useful to find out where event handlers are being registered, especially in an unfamiliar codebase.

In simple cases, you can do this by right-clicking the event definition and clicking Find All References (Shift+F12).  This will show you every line of code that adds or removes a handler from the event by name.  For field-like (ordinary) events, this will also show you every line of code that raises the event.

However, this isn’t always good enough.  Sometimes, event handlers are not added by name.  The .Net data-binding infrastructure, as well as the CompositeUI Event Broker service, will add and remove event handlers using reflection, so they won’t be found by Find All References.  Similarly, if an event handler is added by an external DLL, Find All References won’t find it.

For these scenarios, you can use a less-obvious trick.  As I described last time, adding or removing an event handler actually executes code inside of an accessor method. Like any other code, we can set a breakpoint to see where the code is executed.

For custom events, this is easy.  Just add a breakpoint in the add and/or remove accessors and run your program.  Whenever a handler is added or removed, the debugger will break into the accessor, and you can look at the callstack to determine where it’s coming from.

However, most events are field-like, and don’t have actual source code in their accessor methods.  To set a breakpoint in a field-like event, you need to use a lesser-known feature: function breakpoints (Unfortunately, this feature is not available in Visual Studio Express).  You can click Debug, New Breakpoint, Break at Function (Ctrl+D, N) to tell the debugger to pause whenever a specific managed function is executed.

To add a breakpoint at an event accessor, type Namespace.ClassName.add_EventName.  To ensure that you entered it correctly, open the Debug, Breakpoints window (Ctrl+D, B) and check that the new breakpoint says break always (currently 0) in the Hit Count column.  If it doesn’t say (currently 0), then either the assembly has not been loaded yet or you made a typo in the location (right-click the breakpoint and click Location).

About .Net Events

A .Net event actually consists of a pair of accessor methods named add_EventName and remove_EventName.  These functions each take a handler delegate, and are expected to add or remove that delegate from the list of event handlers. 

In C#, writing public event EventHandler EventName; creates a field-like event.  The compiler will automatically generate a private backing field (also a delegate), along with thread-safe accessor methods that add and remove handlers from the backing field (like an auto-implemented property).  Within the class that declared the event, EventName refers to this private backing field.  Thus, writing EventName(...) in the class calls this field and raises the event (if no handlers have been added, the field will be null).

You can also write custom event accessors to gain full control over how handlers are added to your events.   For example, this event will store and trigger handlers in reverse order:

void Main()
{
    ReversedEvent += delegate { Console.WriteLine(1); };
    ReversedEvent += delegate { Console.WriteLine(2); };
    ReversedEvent += delegate { Console.WriteLine(3); };

    OnReversedEvent();
}

protected void OnReversedEvent() {
    if (reversedEvent != null)
        reversedEvent(this, EventArgs.Empty);
}

private EventHandler reversedEvent;
public event EventHandler ReversedEvent {
    add {
        reversedEvent = value + reversedEvent;
    }
    remove {
        reversedEvent -= value;
    }
}

This add accessor uses the non-commutative delegate addition operator to prepend each new handler to the delegate field containing the existing handlers.  The raiser method simply calls the combined delegate in the private field. (which is null if there aren’t any handlers)

Note that this code is not thread-safe.  If two threads add a handler at the same time, both of them will read the original storage field, add their respective handlers to create a new delegate instance, then write this new delegate back to the field.  The thread that writes back to the field last will overwrite the changes made by the other thread, since it never saw the other thread’s handler (this is the same reason that x += y is not thread-safe).  The accessors generated by the compiler are threadsafe, either by using lock(this) (C# 3 or earlier) or a lock-free threadsafe implementation (C# 4).  For more details, see this series of blog posts.

This example is rather useless.  However, there are better reasons to create custom event accessors. WinForms controls store their events in a special EventHandlerList class to save memory.  WPF controls create events using the Routed Event system, and store handlers in special storage in DependencyObject.  Custom event accessors can also be used to perform validation or logging.

Creating Local Extension Methods

Sometimes, it can be useful to make an extension method specifically for a single block of code.  Unfortunately, since extension methods cannot appear in nested classes, there is no obvious way to do that.

Instead, you can create a child namespace containing the extension method.  In order to limit the extension method’s visibility to a single method, you can put that method in a separate namespace block.  This way, you can add a using statement to that namespace alone.

For example:

namespace Company.Project {
    partial class MyClass {
        ...
    }
}
namespace Company.Project {
    using MyClassExtensions;
    namespace MyClassExtensions {
        static class Extensions {
            public static string Name<T>(this T obj) {
                if (default(T) == null && Equals(obj, default(T)))
                    return "(null " + typeof(T) + ")";
                return obj.GetType() + ": " + obj.ToString() 
                     + "{declared as " + typeof(T) + "}";
            }
        }
    }
    partial class MyClass {
        void DoSomething() {
            object x = new DateTime();
            string name = x.Name();
        }
    }
}

Since the using MyClassExtensions statement appears inside the second namespace block, the extension methods are only visible within that block.  Code that uses these extension method can appear in this second block, while the rest of the class can go in the original namespace block without the extension methods.

This technique should be avoided where possible, since it leads to confusing and non-obvious code.  However, there are situations in which this can make some code much more readable.

Delegates vs. Function Pointers, part 4: C# 2.0+

This is part 4 in a series about state and function pointers; part 1 is here.

Last time, we saw that it is possible to pass local state with a delegate in C#.  However, it involves lots of repetitive single-use classes, leading to ugly code.

To alleviate this tedious task, C# 2 supports anonymous methods, which allow you to embed a function inside another function.  This makes my standard example much simpler:

//C# 2.0
int x = 2;
int[] numbers = { 1, 2, 3, 4 };

int[] hugeNumbers = Array.FindAll(
    numbers, 
    delegate(int n) { return n > x; }
);



//C# 3.0
int x = 2;
int[] numbers = { 1, 2, 3, 4 };

IEnumerable<int> hugeNumbers = numbers.Where(n => n > x);

Clearly, this is much simpler than the C# 1.0 version from last time.  However, anonymous methods and lambda expressions are compile-time features; the CLR itself is not aware of them.  How does this code work? How can an anonymous method use a local variable from its parent scope?

This is an example of a closure – a function bundled together with external variables that the function uses.  The C# compiler handles this the same way that I did manually last time in C# 1: it generates a class to hold the function and the variables that it uses, then creates a delegate from the member function in the class.  Thus, the local state is passed as the delegate’s this parameter.

To see how the C# compiler implements closures, I’ll use ILSpy to decompile the more-familiar C# 3 version: (I simplified the compiler-generated names for readability)

[CompilerGenerated]
private sealed class ClosureClass {
    public int x;
    public bool Lambda(int n) {
        return n > this.x;
    }
}
private static void Main() {
    ClosureClass closure = new ClosureClass();
    closure.x = 2;
    int[] numbers = { 1, 2, 3, 4 };
    IEnumerable<int> hugeNumbers = numbers.Where(closure.Lambda);
}

The ClosureClass (which was actually named <>c__DisplayClass1) is equivalent to the GreaterThan class from my previous example.  It holds the local variables used in the lambda expression.  Note that this class replaces the variables – in the original method, instead a local variable named x, the compiler uses the public x field from the ClosureClass.  This means that any changes to the variable affect the lambda expression as well.

The lambda expression is compiled into the Lambda method (which was originally named <Main>b__0).  It uses the same field to access the local variable, sharing state between the original outer function and its lambda expression.

Next time: Javascript

Open Delegates vs. Closed Delegates

.Net supports two kinds of delegates: Open delegates and closed delegates.

When you create a delegate that points to an instance method, the instance that you created it from is stored in the delegate’s Target property.  This property is passed as the first parameter to the method that the delegate points to.  For instance methods, this is the implicit this parameter; for static methods, it's the method's first parameter.  These are called closed delegates, because they close over the first parameter and bring it into the delegate instance.

It is also possible to create open delegates which do not pass a first parameter.  Open delegates do not use the Target property; instead, all of the target method’s parameters are passed from the delegate’s formal parameter list, including the first parameter.  Therefore, an open delegate pointing to a given method must have one parameter more than a closed delegate pointing to the same method.  Open delegates are usually used to point to static methods.  When you make a delegate pointing to a static method, you (generally) don't want the delegate to hold a first parameter for the method. 

In addition to these two normal cases, it is also possible (in .Net 2.0 and later) to create open delegates for instance methods and to create closed delegates for static methods.  With one exception, C# doesn’t have any syntactical support for these unusual delegates, so they can only be created by calling the CreateDelegate method.

Open delegates by calling the CreateDelegate overload that doesn’t take a target parameter.  Before .Net 2.0, this function could only be called with a static method.   In .Net 2.0, you can call this function with an instance method to create an open delegate.  Such a delegate will call use its first parameter as this instead of its Target field. 

As a concrete example, consider the String.ToUpperInvariant() method.  Ordinarily, this method takes no parameters, and operates on the string it’s called on.  An open delegate pointing to this instance method would take a single string parameter, and call the method on that parameter.

For example:

Func<string> closed = new Func<string>("a".ToUpperInvariant);
Func<string, string> open = (Func<string, string>)
    Delegate.CreateDelegate(
        typeof(Func<string, string>),
        typeof(string).GetMethod("ToUpperInvariant")
    );

closed();     //Returns "A"
open("abc");  //Returns "ABC"

Closed delegates are created by calling the CreateDelegate overload that takes a target parameter.  In .Net 2.0, this can be called with a static method and an instance of that method’s first argument type to create a closed delegate that calls the method with the given target as its first parameter.  Closed delegates curry the first parameter from the target method.  For example:

Func<object, bool> deepThought = (Func<object, bool>)
    Delegate.CreateDelegate(
        typeof(Func<object, bool>),
        2,
        typeof(object).GetMethod("Equals", BindingFlags.Static | BindingFlags.Public)
    );

This code curries the static Object.Equals method to create a delegate that calls Equals with 2 and the delegate’s single parameter).  It’s equivalent to x => Object.Equals(2, x).  Note that since the method is (generally) not a member of the target object’s type, we need to pass an actual MethodInfo instance; a name alone isn’t good enough.

Note that you cannot create a closed delegate from a static method whose first parameter is a value type, because, unlike all instance methods, static delegates that take value types receive their parameters by value, not as a reference.   For more details, see here and here.

C# 3 added limited syntactical support for creating closed delegates.  You can create a delegate from an extension method as if it were an instance method on the type it extends.  For example:

var allNumbers = Enumerable.Range(1, Int32.MaxValue);
Func<int, IEnumerable<int>> countTo = allNumbers.Take;

This code creates an IEnumerable<int> containing all positive integers, then creates a closed delegate that curries this sequence into the static Enumerable.Take<T>(IEnumerable<T>) method.

Except for extension methods, open instance delegates and closed static delegates are rarely used in actual code.  However, it is important to understand how ordinary open and closed delegates work, and where the target object ends up for instance delegates.

Delegates vs. Function Pointers, part 3: C# 1.0

This is part 3 in a series about state and function pointers; part 1 is here.

Last time, we saw that it is impossible to bundle context along with a function pointer in C.

In C#, it is possible to fully achieve my standard example.  In order to explain how this works behind the scenes, I will limit this post to C# 1.0 and not use a lambda expression.  This also means no LINQ, generics, or extension methods, so I will, once again, need to write the filter method myself.

delegate bool IntFilter(int num);

static ArrayList Filter(IEnumerable source, IntFilter filter) {
    ArrayList retVal = new ArrayList();

    foreach(int i in source)
        if (filter(i))
            retVal.Add(i);
            
    return retVal;
}

class GreaterThan {
    public GreaterThan(int b) {
        this.bound = b;
    }
    int bound;
    public bool Passes(int num) {
        return num > bound;
    }
}

int x = 2;
int[] numbers = { 1, 2, 3, 4 };
Filter(numbers, new IntFilter(new GreaterThan(x).Passes));

Please excuse the ugly code; in order to be true to C# 1.0, I can’t just write an iterator, and I can’t implicitly create the delegate.

This code creates a class called GreaterThan that holds the Passes method and the state used by the method.  I create a delegate to pass to Filter out of the Passes method for an instance of the GreaterThan class from the local variable.

To understand how this works, we need to delve further into delegates.  .Net delegates are more than just type-safe function pointers. Unlike the function pointers we've looked at, delegates include state – the Target property. This property stores the object to pass as the hidden this parameter to the method (It's actually somewhat more complicated than that; I will describe open and closed delegates at a later date).  My example creates a delegate instance in which the Target property points to the GreaterThan instance.  When I call the delegate, the Passes method is called on the correct instance from the Target property, so it can read the bound field.

Next time, we'll see how the C# 2.0+ compilers generate all this boilerplate for you using anonymous methods and lambda expressions.

Dissecting Razor, part 5: Use the Source, Luke

Last time, we saw how basic Razor constructs are translated into C#.

We can see the generated class by adding @{ #error } to the page source. This creates a compiler error in the Execute method, and the resulting Yellow Screen of Death contains a Show Complete Compilation Source: link which will show the generated C# class. 

Let’s start with a very simple page:

<!DOCTYPE html>
<html>
    <body>
        1 + 2 = @(1 + 2)<br />
        @{ var source = "<b>bold &amp; fancy</b>"; }
        <code>@source</code> is rendered as
        @(new HtmlString(source))
    </body>
</html>
@{ #error }

This page is rendered like this: (after removing @{ #error })

<!DOCTYPE html>
<html>
    <body>
        1 + 2 = 3<br />
        <code>&lt;b&gt;bold &amp;amp; fancy&lt;/b&gt;</code> is rendered as
        <b>bold &amp; fancy</b>
    </body>
</html>

As expected, the expression @source is automatically escaped.  Also notice that the newline and indentation around the code block (@{ var ... })  was not rendered – the Razor parser strips all whitespace surrounding code blocks.  This is a welcome improvement over the ASPX view engine.

Now let’s look at how this HTML is generated.  This page is transformed into the following C# source:

namespace ASP {
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Net;
    using System.Web;
    using System.Web.Helpers;
    using System.Web.Security;
    using System.Web.UI;
    using System.Web.WebPages;
    using System.Web.WebPages.Html;
    using WebMatrix.Data;
    using WebMatrix.WebData;
    public class _Page_Razor_SimplePage_cshtml : System.Web.WebPages.WebPage {

#line hidden
        public _Page_Razor_WriteTo_cshtml() {
        }

        protected ASP.global_asax ApplicationInstance {
            get {
                return ((ASP.global_asax)(Context.ApplicationInstance));
            }
        }

        public override void Execute() {
            WriteLiteral("<!DOCTYPE html>\r\n<html>\r\n    <body>\r\n        1 + 2 = ");

#line 4 "...\SimplePage.cshtml"
            Write(1 + 2);
#line default
#line hidden
            WriteLiteral("<br />\r\n");

#line 5 "...\SimplePage.cshtml"
            var source = "<b>bold &amp; fancy</b>";
#line default
#line hidden
            WriteLiteral("        <code>");

#line 6 "...\SimplePage.cshtml"
            Write(source);
#line default
#line hidden
            WriteLiteral("</code> is rendered as\r\n        ");

#line 7 "...\SimplePage.cshtml"
            Write(new HtmlString(source));
#line default
#line hidden
            WriteLiteral("\r\n    </body>\r\n</html>\r\n");

#line 10 "...\SimplePage.cshtml"
#error
#line default
#line hidden

        }
    }
}

The WebPageRazorEngineHost injects the ApplicationInstance property into the CodeDOM tree; this property allows code in the page to access any custom properties in Global.asax.

As mentioned earlier, the page source is compiled into the Execute() method.

It uses #line directives to pretend that its code is actually in the CSHTML page.  This means that code or line numbers appearing in error pages come from the original CSHTML source, making the code easier to find when debugging.  The #line hidden directives indicate generated source that did not come from actual code in the CSHTML.

As mentioned last time, literal HTML source is passed to the WriteLiteral method, which is inherited from the WebPageBase class.  This method writes its argument to the current output stream (which can vary when making sections).  These calls are wrapped in #line hidden because they come from literal text, not code.

The two code blocks (the variable declaration and the #error directive) are copied straight into  Execute(), wrapped in #line directives that map them to the actual code lines in the CSHTML.

The code nuggets are passed to the Write method, and are similarly wrapped in #line directives.

Here is a more sophisticated Razor page:

<!DOCTYPE html>
<html>
    <body>
        @{ const int count = 10; }
        <table>
            @for (int i = 0; i < count; i++) {
                <tr>
                    <td>@i</td>
                    <td>@(i * i)</td>
                </tr>
            }
        </table>
    </body>
</html>
@{ #error }

The @for loop is a code block in the form of a control flow statement.  Razor’s C# parser is aware of C# control flow structures and parses them as code blocks.  (The VB parser does the same thing)

Here is the generated Execute() method, with #line directives removed for clarity:

public override void Execute() {
    WriteLiteral("<!DOCTYPE html>\r\n<html>\r\n    <body>\r\n");

    const int count = 10;
    WriteLiteral("        <table>\r\n");

    for (int i = 0; i < count; i++) {
        WriteLiteral("                <tr>\r\n                    <td>");

        Write(i);
        WriteLiteral("</td>\r\n                    <td>");

        Write(i * i);
        WriteLiteral("</td>\r\n                </tr>\r\n");

    }
    WriteLiteral("        </table>\r\n    </body>\r\n</html>\r\n");
#error
}

Here too, we see that all contiguous chunks of literal HTML are passed to WriteLiteral.

This example has two code blocks – the const declaration and the loop.  The for loop code block has HTML inside of it – any HTML markup inside a code block is parsed as normal HTML and passed to WriteLiteral.

Next Time: Function blocks

Dissecting Razor, part 4: Anatomy of a Razor Page

After looking at the various assemblies in the WebPages framework, we will drill into the inner workings of Razor pages.

Razor Side

An ordinary CSHTML page is transformed into a class which inherits the WebPage class.  The generator overrides the abstract Execute() method from the to render the page to the HTTP response stream.

Except for class-level directives and constructs (which will be discussed later), all ordinary content in a Razor page end up in the Execute method. 

There are three types of normal content: Literals, Code Blocks, and Code Nuggets.

Literals include any normal text. Razor compiles literal text into calls to the WriteLiteral method, with the text as a (correctly-escaped) string parameter.  Razor expects this method to write its parameter to the page.

Code Blocks include @{ ... } blocks, as well as control structures.  They’re Razor’s equivalent of <% ... %> blocks.  The contents of a code block are emitted as-is in the Execute method.  Code blocks must contain complete statements, or they’ll result in C# syntax errors.

VBHTML pages use @Code ... End Code blocks instead.

Code Nuggets are @-blocks.  They’re Razor’s equivalent of <%: ... %> blocks in an ASPX page.  Scott Guthrie describes how these blocks are tokenized.  The contents of a code nugget are passed to the Write method, which is expected to HTML-escape its parameter and print it.

WebPages Side

The WebPages framework’s  Write method (which comes from the WebPageBase class) takes a parameter of type Object, allowing one to put any expression in a code nugget.  It passes its parameter to HttpUtility.HtmlEncode, which will call ToString() and HTML-escape the output.  If the parameter is an IHtmlString, HtmlEncode will return its ToHtmlString() method without escaping.

The base class, method names, and default namespaces can be configured in the RazorEngineHost.  In addition, custom RazorEngineHosts can override the PostProcessGeneratedCode method to make arbitrary modifications to the generated code.

The WebRazorHostFactory in System.Web.WebPages.Razor.dll can also read default namespaces, a default base type, and a custom host from the <system.web.webPages.razor> section in Web.config:

<system.web.webPages.razor>
    <host factoryType="MyProject.MyWebPageRazorHost" />
    <pages pageBaseType="MyProject.CustomWebPage">
        <namespaces>
            <add namespace="MyProject.SomeNamespace" />
        </namespaces>
    </pages>
</system.web.webPages.razor>

Next Time: Looking at the generated page

Generic Base Classes in ASP.Net MVC

Last time, we saw that there are severe limitations in creating ASPX pages which inherit generic base classes.  Many readers were probably wondering how ASP.Net MVC works around this limitation.  In ASP.Net MVC views, people write pages like this all the time:

<%@ Page Language="C#" 
    Inherits="ViewPage<IEnumerable<DataLayer.Product>>" %>

ASP.Net MVC includes its own workaround for these limitations.  The Web.config file in the Views folder of an ASP.Net MVC project registers a PageParserFilter:

<pages
    validateRequest="false"
    pageParserFilterType="System.Web.Mvc.ViewTypeParserFilter, System.Web.Mvc, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"
    ...>
    ...
</pages>

PageParserFilter is one of ASP.Net’s lesser-known extensibility points.  It can intercept different parts in the parsing process for an ASPX page and modify the page.  The MVC framework’s ViewTypeParserFilter will check whether the page’s inherits="" attribute contains ( or a < characters; these characters can only appear in C# or VB.Net generic types, but not in the CLR’s native syntax.

If the inherits="" attribute contains these characters, it will save the attribute’s original value, then replace it with ​ViewPage (Or ViewMasterPage or ViewUserControl, as appropriate).  This way, the rest of the built-in ASP.Net parser will see a normal type name that it knows how to parse. After the page finishes parsing, an internal ControlBuilder registered on MVC’s base types (ViewPage, ViewMasterPage or ViewUserControl) will replace the base type in the generated CodeDOM tree with the original value of the inherits="" attribute.

The one problem with the hack is that it leaves the ASPX parsing engine unaware of the page’s actual base type.  Therefore, if you make a page that inherits a generic base class with additional properties, you won’t be able to set those properties in the <%@ Page declaration (since the ASPX parser won’t know about them).  If you inherit a non-generic type, this mechanism will not kick in and page properties will work fine.

Generic Base Classes in ASP.Net

ASP.Net pages can inherit from custom classes (as long as they inherit System.Web.UI.Page).  This can be useful to add utility functions or shared (code-behind) behaviors to your pages.  (Note that you could also use Extension methods or HTTP modules)

However, if you try to inherit a generic base class, it won’t work:

public class DataPage<T> : Page {
    public T Data { get; set; }
}
<%@ Page Language="C#" Inherits="DataPage<string>" %>

This code results in a yellow screen of death, with the  parser error, Could not load type 'DataPage<string>'.

This happens because the ASP.Net page parser is unaware of C# generics syntax.  The familiar generics syntax (eg, List<int>) is actually a C# innovation and is not used at all in the actual framework.  The “native” generics syntax, which is used by Reflection, is markedly different: List`1[Int32] (namespaces omitted for brevity).  This name is returned by the Type.AssemblyQualifiedName property.

Since ASP.Net uses reflection APIs to load types, we need to specify generic types using CLR syntax (and with full namespaces).   Therefore, the following page will work:

<%@ Page Language="C#" 
    Inherits="TestSite.DataPage`1[[System.String, mscorlib]]" %>

However, it’s not so simple.  ASP.Net does not call Type.GetType to parse these strings; instead, it loops over every referenced assembly and calls Assembly.GetType on each one.  This is why you don’t need to include the assembly name whenever using the Inherits attribute (which would have been necessary for Type.GetType)  Ordinarily, this is very useful, but here, it comes back to bite you. 
It is not possible to parse a type from one assembly with a generic parameter from a different assembly using Assembly.GetType, unless the generic parameter is in mscorlib.

Therefore, for example, it is not possible to create an ASPX page that inherits DataPage<DataLayer.Product> if DataLayer.Product is in a different assembly than DataPage.  As a workaround, one can create a non-generic class which inherits DataPage<DataLayer.Product>, then make the ASPX page inherit this temporary class.

Next time: MVC magic

Building Connection Strings in .Net

.Net developers frequently need to build connection strings, especially when connecting to Access or Excel files using OleDB.
Code like the following has been written countless times:

//Bad code! Do not use!
string conn = "Data Source=" + openFileDialog1.FileName + "; "
            + "Provider=Microsoft.Jet.OLEDB.4.0;"
            + "Extended Properties=\"Excel 8.0\"";

This code looks innocuous at first glance, but will not work for all filenames.  If the filename contains characters like ', ", ;, or =, this code will create an invalid connection string and throw an exception.

The correct way to build connection strings is to use one of the DbConnectionStringBuilder classes.  This class implements a dictionary of key-value pairs in the connection string.  It has a ConnectionString property which assembles the instance’s contents into a usable connection string.  Unlike the string concatenation shown above, this property will correctly escape all values.

In addition, each of the database clients included with the .Net framework (SQL, OleDB, ODBC, Oracle, and Entity Framework) have their own inherited ConnectionStringBuilder classes in their respective namespaces.  These classes add type-safe properties for the the keys supported by their databases, and handle any special cases when generating the connection string.

Thus, the correct way to write the above code is:

var connBuilder = new OleDbConnectionStringBuilder {
    DataSource = openFileDialog1.FileName,
    Provider = "Microsoft.Jet.OLEDB.4.0"
};
connBuilder["Extended Properties"] = "Excel 12.0 Macro";
As an added bonus, these classes implement ICustomTypeDescriptor, so they can be bound to a PropertyGrid to allow the end-user to edit the connection string.  This can be seen in certain places in Visual Studio.