Partial Type Inference in .Net

When designing fluent APIs, one issue that comes up is partial type inference.  If a method has two type parameters, there is no way to call it and only specify one of the type parameters (and leave the other inferred by the compiler)

For example, suppose we are creating a type-safe wrapper around a parameterized SqlCommand.
Ideally, it would be called like this:

using(DbConnection connection = ...) {
    var result = connection.ExecuteScalar<int>(
        "SELECT COUNT(*) FROM TableName WHERE Modified > someDate",
        new { someDate }
    );
}

Where the generic parameter specifies the return type.

In order to implement this efficiently, one would create methods at runtime which add DbParameters for each property in the anonymous type, and store them in a static generic class.   
It would look something like this:

static class Extensions {
    public static void AddParams<TParam>(this IDbCommand command, 
                                         TParam parameters)
        where TParam : class { 
        if (parameters != null) 
            ParamAdders<TParam>.Adder(command, parameters);
    }
    static class ParamAdders<TParam> where TParam : class {
        public delegate void ParamAdder(IDbCommand command,
                                        TParam parameters);
        public static readonly ParamAdder Adder = CreateParamAdder();

        private static ParamAdder CreateParamAdder() { 
            return ...; 
        }
    }
}

However, that requires that the ExecuteScalar extension method take TParam as a generic parameter.  Since anonymous types can only be passed as generic parameters via type inference, this makes it impossible to pass the return type as a generic parameter.

To fix this issue, we can split the generic parameters across two methods.  We can change the extension method to take a single generic parameter, and return a generic class with a method that takes the other generic parameter.

For example:

static class BetterExtensions {
    public static SqlStatement<T> Sql<T>(this IDbConnection conn,
                                         string sqlText) { 
        return new SqlStatement<T>(conn, sqlText); 
    }
}
class SqlStatement<TReturn> : IHideObjectMembers {
    public SqlStatement(IDbConnection connection, string sql) { 
        Connection = connection; 
        Sql = sql;
    }
    public IDbConnection Connection { get; private set; }
    public string Sql { get; private set; }

    public TReturn Execute() { return Execute<object>(null); }
    public TReturn Execute<TParam>(TParam parameters)
         where TParam : class {
        return ...;
    }
}

I use an IHideObjectMembers interface to hide the methods inherited from Object from IntelliSense.  Note that the interface must be defined in an assembly outside of your solution. (or, to be more precise, that isn’t a Project reference)

This version would be called like this:

using(IDbConnection connection = null) {
    var result = connection.Sql<int>(
        "SELECT COUNT(*) FROM TableName WHERE Modified > someDate"
    ).Execute(new { someDate });
}

The return type is specified explicitly in the call to Sql<T>(), and the parameter type is passed implicitly to Execute<TParam>().

Simplifying Value Comparison Semantics

A common chore in developing real-world C# applications is implementing value semantics for equality.  This involves implementing IEquatable<T>, overriding Equals() and GetHashCode(), and overloading the == and != operators.

Implementing these methods is a time-consuming and repetitive task, and is easy to get wrong, especially GetHashCode().  In particular, the best way implement GetHashCode() is much more complicated than return x.GetHashCode() ^ y.GetHashCode().

To simplify this task, I created a ValueComparer class:

///<summary>
/// Contains all of the properties of a class that 
/// are used to provide value semantics.
///</summary>
///<remarks>
/// You can create a static readonly ValueComparer for your class,
/// then call into it from Equals, GetHashCode, and CompareTo.
///</remarks>
class ValueComparer<T> : IComparer<T>, IEqualityComparer<T> {
    public ValueComparer(params Func<T, object>[] props) {
        Properties = new ReadOnlyCollection<Func<T, object>>(props);
    }

    public ReadOnlyCollection<Func<T, object>> Properties
            { get; private set; }

    public bool Equals(T x, T y) {
        if (ReferenceEquals(x, y)) return true;
        if (x == null || y == null) return false;
        //Object.Equals handles strings and nulls correctly
        return Properties.All(f => Equals(f(x), f(y)));    
    }

    //http://stackoverflow.com/questions/263400/263416#263416
    public int GetHashCode(T obj) {
        if (obj == null) return -42;
        unchecked {
            int hash = 17;
            foreach (var prop in Properties) {
                object value = prop(obj);
                if (value == null)
                    hash = hash * 23 - 1;
                else
                    hash = hash * 23 + value.GetHashCode();
            }
            return hash;
        }
    }

    public int Compare(T x, T y) {
        foreach (var prop in Properties) {
            //The properties can be any type including null.
            var comp = Comparer.DefaultInvariant
                .Compare(prop(x), prop(y));    
            if (comp != 0)
                return comp;
        }
        return 0;
    }
}

This class implements an external comparer that compares two instances by an ordered list of properties.

ValueComparer can be used as a standalone IComparer<T> or IEqualityComparer<T> implementation.

It can also be used to implement value semantics within a type.
For example:

class Person : IComparable<Person>, IEquatable<Person>, IComparable {
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string Address { get; set; }
    public string Phone { get; set; }
    public string Email { get; set; }

    public override int GetHashCode() { return Comparer.GetHashCode(this); }
    public int CompareTo(Person obj) { return Comparer.Compare(this, obj); }
    int IComparable.CompareTo(object obj) { return CompareTo(obj as Person); }
    public bool Equals(Person obj) { return Comparer.Equals(this, obj); }
    public override bool Equals(object obj) { return Equals(obj as Person); }
    static readonly ValueComparer<Person> Comparer = new ValueComparer<Person>(
        o => o.LastName,
        o => o.FirstName,
        o => o.Address,
        o => o.Phone,
        o => o.Email
    );
}

To simplify this task, I created a code snippet:

<?xml version="1.0" encoding="utf-8" ?>
<CodeSnippets  xmlns="http://schemas.microsoft.com/VisualStudio/2005/CodeSnippet">
    <CodeSnippet Format="1.0.0">
        <Header>
            <Title>ValueComparer</Title>
            <Shortcut>vc</Shortcut>
            <Description>Code snippet for equality methods using ValueComparer</Description>
            <Author>SLaks</Author>
            <SnippetTypes>
                <SnippetType>Expansion</SnippetType>
            </SnippetTypes>
        </Header>
        <Snippet>
            <Declarations>
                <Literal Editable="false">
                    <ID>classname</ID>
                    <ToolTip>Class name</ToolTip>
                    <Default>ClassNamePlaceholder</Default>
                    <Function>ClassName()</Function>
                </Literal>
            </Declarations>
            <Code Language="csharp">
                <![CDATA[public override int GetHashCode() { return Comparer.GetHashCode(this); }
        public int CompareTo($classname$ obj) { return Comparer.Compare(this, obj); }
        int IComparable.CompareTo(object obj) { return CompareTo(obj as $classname$); }
        public bool Equals($classname$ obj) { return Comparer.Equals(this, obj); }
        public override bool Equals(object obj) { return Equals(obj as $classname$); }
        static readonly ValueComparer<$classname$> Comparer = new ValueComparer<$classname$>(
            o => o.$end$
        );]]>
            </Code>
        </Snippet>
    </CodeSnippet>
</CodeSnippets>
It can also be downloaded here; save it to My Documents\Visual Studio 2010\Code Snippets\Visual C#\My Code Snippets\

When shouldn’t you write ref this?

Last time, we saw that the this parameter to an instance method in a struct is passed by reference, allowing the method to re-assign this or pass it as a ref parameter.

Due to limitations in the CLR, the this parameter to an iterator method is not a reference to the caller’s struct, and is instead a copy of the value.  Quoting the spec (§7.6.7)

  • When this is used in a primary-expression within an instance method or instance accessor of a struct, it is classified as a variable. The type of the variable is the instance type (§10.3.1) of the struct within which the usage occurs.
    • If the method or accessor is not an iterator (§10.14), the this variable represents the struct for which the method or accessor was invoked, and behaves exactly the same as a ref parameter of the struct type.
    • If the method or accessor is an iterator, the this variable represents a copy of the struct for which the method or accessor was invoked, and behaves exactly the same as a value parameter of the struct type.

To explain why, you’ll need to understand how iterators are compiled.  As Jon Skeet explains, an iterator method is compiled into a nested class that implements IEnumerable, with the original code transformed into a state machine.  This nested class has fields to store the method’s parameters (which includes this for instance methods) so that the iterator code can use them. 

This is why iterators cannot take ref parameters; the CLR cannot store a reference (as opposed to a reference type) in a field.  Therefore, the this parameter is passed to iterator methods by value, not by reference.
This is also why anonymous methods in structs cannot use this; anonymous methods are also compiled to methods in separate classes and so cannot inherit the ref parameter.

This means that if you change the Mutate method from before into an iterator, the code will still compile (!), but the calling method will not see the changes.

static void Main() {
    Mutable m = new Mutable();
    m.MutateWrong().ToArray();    //Force the iterator to execute
    Console.WriteLine("In Main(): " + m.Value);
}

struct Mutable {
    public int Value;
    
    public IEnumerable<int> MutateWrong() {
        this = new Mutable();
        MutateStruct(ref this); 
        Console.WriteLine("Inside MutateWrong(): " + Value);
        yield break;
    }
}
static void MutateStruct(ref Mutable m) { 
    m.Value++; 
}
This code prints
Inside MutateWrong(): 1
In Main(): 0
In summary, don’t mutate structs in iterators (or at all, if you can help it).
I don’t know why this isn’t a compiler error.

When can you write ref this?

Usually, you cannot pass ref this as a parameter, since this is not a writable field.  However, that’s not true for value types.  The this field of a value type is a writable value.

To quote the spec (§5.1.5)

Within an instance method or instance accessor of a struct type, the this keyword behaves exactly as a reference parameter of the struct type (§7.6.7).

Therefore, the following code prints 1:

static void Main() {
    Mutable m = new Mutable();
    m.Mutate();
    Console.WriteLine(m.Value);
}

struct Mutable {
    public int Value;
    
    public void Mutate() {
        this = new Mutable(); 
        MutateStruct(ref this); 
    }
}
static void MutateStruct(ref Mutable m) { 
    m.Value++; 
}

In practice, this should never come up, since mutable structs are evil and should be avoided at all costs.

Next time: This doesn’t always work.

Nothing vs Null

VB.Net’s Nothing keyword is is not the same as C#’s null.  MSDN states, “Assigning Nothing to a variable sets it to the default value for its declared type. If that type contains variable members, they are all set to their default value”.

In other words, the Nothing keyword is actually equivalent to C#’s default(T) keyword, where T is the type that the expression is used as.

This can lead to nasty surprises with nullable types in conditional operators. 
In C#, the expression (...) ? null : 1 will not compile, since “there is no implicit conversion between '<null>' and 'int'”.    Since  null is an untyped expression, the type of the conditional is inferred to be int, resulting in an error because null cannot be converted to int.

In VB.Net, by contrast, the equivalent expression, If((...), Nothing, 1), will compile, but will have unexpected results.  Here too, Nothing is an untyped expression, so the type of the conditional is inferred to be Integer.  However, unlike null, Nothing can be converted to Integer, so this is compiled as If((...), 0, 1),  which is probably not what the programmer intended.

In both languages, the solution is to use an expression which is actually typed as int?, by writing new int?(), (in C#) or New Integer?() (in VB.Net).

Animating Table Rows with jQuery

jQuery contains a powerful and flexible animation engine.  However, it has some limitations, primarily due to underlying limitations of CSS-based layout

For example, there is no simple way to slideUp() a table row (<tr> element).  The slideUp animation will animate the element’s height to zero.  However, a table row is always tall enough to show its elements, so the animation cannot actually shrink the element.

To work around this, we can wrap the contents of each cell in a <div> element, then slideUp() the <div> elements.  Doing this in the HTML would create ugly and non-semantic markup, so we can do it in jQuery instead.

For example: Demo

$('tr')
    .children('td, th')
    .animate({ padding: 0 })
    .wrapInner('<div />')
    .children()
    .slideUp(function() { $(this).closest('tr').remove(); });

Explanation:

  1. Get all of the cells in the row
  2. Animate away any padding in the cells
  3. Wrap all of the contents of each cell in one <div> element for each cell (calling wrapInner())
  4. Select the new <div> elements
  5. Slide up the <div>s, and remove the rows when finished.

If you don’t remove the rows, their borders will still be visible.  Therefore, if you want the rows to stay after the animation, call hide() instead of remove().

Requiring Inherited Types in Generic Constraints

A generic class can specify that its generic parameter must inherit a type.  However, there is no obvious way in general to prevent clients from passing the base type itself.

For example, take the following set of types:

abstract class Entity  { }

class Person : Entity { }
class Boat : Entity { }
class Car : Entity { }

class Repository<TEntity> where TEntity : Entity { }

This allows the type Repository<Entity>, which doesn’t make logical sense.

In this particular case, we could prevent that by changing the generic constraint to where TEntity : Entity, new().  Since the base Entity class is abstract, that would disallow a Repository<Entity>.  However,if the concrete entities also  don’t have default constructors, this wouldn’t work.  Similarly, had the base type been an interface, we could add a : class constraint.

There is an (somewhat) ugly hack that can be used to prevent parameterizations of the base class in arbitrary cases (as long as you control every type involved):

abstract class Entity  { }
interface IConcreteEntity { }    //Marker interface

class Person : Entity, IConcreteEntity { }
class Boat : Entity, IConcreteEntity { }
class Car : Entity, IConcreteEntity { }

class Repository<TEntity> 
      where TEntity : Entity, IConcreteEntity { }

Specifically, we can add a marker interface which is implemented by all concrete implementations of the base type.  We can then constrain a generic parameter to inherit both the base type and the marker interface.  Since the base class itself does implement the marker interface, it will not be valid as a parameter. 
Note that the marker interface must be implemented (perhaps indirectly) by every single concrete implementation.

Nested Iterators, part 2

In part 1, we discussed the simple approach to making a nested iterator.  However, we fell short of a completely lazy nested iterator.

In simple cases, we can make an separate iterator method for the subsequence:

IEnumerable<IEnumerable<int>> FullyLazy() {
    for(int i = 0; i < 10; i++) 
        yield return Inner(i);
}
IEnumerable<int> Inner(int i) {
    for(int j = 0; j < 10; j++)
        yield return i * 10 + j;
}
Note that this is actually smaller than the single-method implementation!

This seems to work very well; the inner iterator code for a particular subsequence will not execute at all unless that subsequence is actually enumerated.

However, this approach falls short in practice.  To see why, consider a real-world example.
Here is a fully lazy implementation of a Partition method, which converts a sequence into a 2D “jagged IEnumerable” where each subsequence (partition) contains n elements.

class Ref<T> { 
    public Ref(T value) { Value = value; } 
    public T Value { get; set; } 
}

public static IEnumerable<IEnumerable<T>> Partition<T>
              (this IEnumerable<T> sequence, int size) {
    using(var enumerator = sequence.GetEnumerator()) {
        var isFinished = new Ref<bool>(false);
        while(!isFinished.Value) {
            yield return PartitionInner(
                 enumerator, size, isFinished
            );
        }
    }
}
static IEnumerable<T> PartitionInner<T>
      (IEnumerator<T> enumerator, int size, Ref<bool> isFinished) {
    while(size --> 0) {
          isFinished.Value = !enumerator.MoveNext();
        if (isFinished.Value) yield break;
        yield return enumerator.Current;
    }
}

This method is complicated by the need to communicate back from the inner iterator to the outer one.  Since iterators cannot have ref parameters, I need to make a “box” class that holds a reference to an int.  This allows the outer iterator to find out when the sequence finishes.

In addition, this implementation has a subtle bug: If the last partition is full, it will return an extra, empty, partition after it.
Fixing this issue would require that the outer method call MoveNext() before each call to the inner method.  This makes the code even more complicated; I won’t list it here (unless people really want me to)

This design has a more fundamental problem: The behavior of the outer method is determined by the inner ones.  It will only work if each inner IEnumerable<T> is enumerated exactly once, before the next MoveNext() call to the outer iterator.  If, for example, you iterate each inner iterator twice, it will behave unexpectedly:

Enumerable.Range(1, 8)
          .Partition(3)
          .Select(p => String.Join(", ", p.Concat(p)));
This code is intended to create strings that contain each partition twice.  It is supposed to return
1, 2, 3, 1, 2, 3
4, 5, 6, 4, 5, 6
7, 8, 7, 8

However, it actually creates the strings

1, 2, 3, 4, 5, 6
7, 8

Enumerating the inner iterator a second time will end up consuming extra items from the original sequence.

Thus, in order to produce enumerables that behave correctly, the inner enumerator needs to cache its items; it is impossible to make a fully lazy Partition method.

It gets worse.  In order to allow callers to write thingy.Partition(5).Skip(2), (which should skip the first two partitions) the outer enumerator needs to cache all items, because it cannot assume that the inner iterators will be called at all. 

Thus, the laziest possible Partition method must use the original approach:

public static IEnumerable<IEnumerable<T>> PartitionCorrect<T>
              (this IEnumerable<T> sequence, int size) {
    List<T> partition = new List<T>();
    foreach(var item in sequence) {
        partition.Add(item);
        if (partition.Count == size) {
            yield return partition;
            partition = new List<T>();
        }
    }
    if (partition.Count > 0)
        yield return partition;
}

It would be possible to write a slightly lazier version that combines these two approaches and passes a List<T> to the inner iterator, which would return whatever is in the list, then enumerate if necessary to fill the list. However, I’m too lazy to do it.  (unless people really want me to)

Nested Iterators, part 1

C# 2.0 introduced a powerful feature called an iterator, a method which returns an IEnumerable<T> or IEnumerator<T> using the new yield keyword.

Using an iterator, you can quickly and easily create a method which returns lazily a sequence of values.  However, lazily returning a sequence of sequences (IEnumerable<IEnumerable<T>>) is not so simple.

The obvious approach is to yield return a List<T>:

IEnumerable<IEnumerable<int>> SemiLazy() {
    for(int i = 0; i < 10; i++) {
        List<int> numbers = new List<int>();
        for(int j = 0; j < 10; j++)
            numbers.Add(i * 10 + j);
            
        yield return numbers;
    }
}

(This can be shortened to a single LINQ statement, but that’s beyond the point of this post: Enumerable.Range(0, 10).Select(i => Enumerable.Range(10 * i, 10)) )

This approach is very simple, but isn’t very lazy; each subsequence will be computed in its entirety, whether it’s consumed or not.

This approach also has a subtle catch: the iterator must return a different List<T> instance every time.

If you “optimize” it to re-use the instance, you’ll break callers which don’t use the subsequences immediately:

IEnumerable<IEnumerable<int>> Wrong() {
    List<int> numbers = new List<int>();
    for(int i = 0; i < 10; i++) {
        numbers.Clear();
        for(int j = 0; j < 10; j++)
            numbers.Add(i * 10 + j);
            
        yield return numbers;
    }
}

Calling SemiLazy().ToArray()[0].First() will return 0 (the first element in the first subsequence); calling Wrong().ToArray()[0].First() will return 90 (since all subsequences refer to the same instance).

Next: How can we achieve full laziness?

On copy prevention in HTML, part 3

Migrated from my old blog; originally posted 4/16/2007

My previous post stretched the limit of simple copy prevention. Beyond this point, it gets very complicated. Before continuing, some thought is in order. Who are you trying to prevent from copying your text? Why shouldn't the text be copied? Unless you are trying to stop a hardcore developer, the previous methods should suffice. Also, what kind of copying are you trying to prevent? If you are trying to prevent the copier from copying into a web page, it is significantly harder, because he can copy your source and it will display normally.

I can think of two ways to prevent the copier from using a screenreader to copy your text.

  1. Put all of the text into a single, CAPTCHA-like image. This way, the screenreader will not be able to read the text. However, this will also make it more difficult for legitimate people to read your text. Also, the copier could simply insert the large image as-is into his document. This risk could be mitigated by watermarking it with your name.
  2. Break apart the text into many different images, each one somewhat smaller than a letter. This could be done in server-side code. Then, use a JavaScript timer to alternate the images so that the screenreader will never see all of the text at once. To prevent the copier from modifying the JavaScript to show all of the images, alternate them with other images. For example, write a server-side script that takes X and Y coordinates, and a timer index. This script would either a white image, or the chunk of text at the given coordinates, depending on the timer index. To prevent the copier from using GDI to OR-blit screenshots from different times, (or from using Photoshop to make the white transparent, then pasting them together) the white could have random black patterns. However, this will make the text hard to read. If a JavaScript timer isn't fast enough to make the text legible, it could be done in Flash.
Preventing the copier from copying your content into HTML is more difficult. No matter how obscure your source is, the copier could use a tool like Firebug to copy your DOM source using the innerHTML property. Here too, there are several options.
  1. Use my Scrambler (see part 2), and put your name anywhere within the scrambled text. (for example, you could put in, in the middle, Written by Your Name Here; do not copy) It is virtually impossible for the copier to extricate the spans that form your name and then fill the resulting gap. This could also be done in an image. However, if you put your name at the end, the copier could position a white DIV to hide it. Or, if his name is similar in length, he could position a white DIV with his name to hide it.
  2. If you are only worried about part of your text, scramble the entire page. It would be extremely difficult for the copier to extract the sensitive part. For example, you could add a lengthy copyright header before the content, and scramble it with the content. However, the copier could position the containing DIV so that the copyright header is above the top of the DIV.
  3. Break the text into a large set of images, and use client-side JavaScript to execute a server-generated script that adds these images from another server-side script. For every request, require a single-use authorization token returned by the previous request. The initial page request would include the auth-token for the first script, and each image would be preceded by an AJAX request for auth-tokens for the image and for the next AJAX request. The first script would have an auth-token for the first AJAX request embedded within it. To make it more difficult for an attacker to get an auth-token from the initial script, you could encode the script before sending it, and decode on the client, then pass it to the eval() function. All responses would have the no-cache header to prevent the copier from taking the images out of the cache, and the images would be used by the background-image attribute on a DIV to prevent Save Image As (only necessary if Save Image As doesn't redownload the image from the server). The server would track auth-tokens in a database, and delete them when used. If you do all this, the only way the copier could get the images from the page would be Print Screen. To prevent that, make the images flicker as described earlier. The copier could, however, load your page with JavaScript disabled, download and decode the script, convert it to a full programming language (eg, C#), and use the script's embedded auth-token to send the "AJAX" requests over HTTP (for example, using .NET's HttpWebRequest class) and download the images. To prevent this, make the auth-tokens expire after about 30 seconds. If any auth-token is expired, all subsequent images should form something different. (maybe Service Unavailable, or random black pixels, or a different text) By the time the copier finishes writing his program, his "stolen" auth-token will have expired, and he will not know what he did wrong. To prevent him from trying again, you could blacklist his IP address after receiving an expired request, and embed IP address in auth-tokens. You could also set and require some innocuous-seeing cookies on the server for every request. The copier wouldn't notice these cookies, and when he requests the images without these cookies in his request, you could send him whatever you want. Please note that if your page requires a login, the copier probably will check cookies. And, if the copier is being paid by the hour (this is quite likely; otherwise, he'd give up), he might even thank you for doing all this.

On copy prevention in HTML, part 2

Migrated from my old blog; originally posted 4/16/2007

The methods discussed in my previous post are crude and ugly. Most of the time, they do work, but they do nothing to prevent the user from viewing the source and copying the text from there. Also, the user has a right to select text that should not be denied. For example, if one wants to show someone part of a large document, the easiest way to do that is to select the part.

ZSkTuKpBrLljyVW GmtoBbO MVocxRvoopy zKYtQahiDEsh LLtQexowSEtDnIg. NoyticDMe thiMaDVnZt, whGZenjEE ufapdeIPasBZxtgCeYWDd, iSt BlMNooks KPzRlkeeGifkshqdheodB tIVnoMNtEal nySouQnAqVsensegX. cUHHoNweqdcvecFGrU,PGZ pMibt rqcanrbKkn eHstilqTulOE beRPuv STwaQsyTePXvplRoCectxeKAjd jVpnXljoDYrDlrmaKlly.B IiLwokzofk at itsjw JXsCoIuuFhjrce:
<SPAN style="position: static;left:-9477px;position: absolute;">Z</SPAN>
<SPAN style="position: static;left:-9765px;position: absolute;">S</SPAN>
<SPAN style="position: static;left:-9586px;position: absolute;">k</SPAN>
<SPAN style="position: static;left:-9373px;position: absolutee;">T</SPAN>
<SPAN style="position: static;left:-9734px;position: absolute;">u</SPAN>
<SPAN style="position: static;left:-9872px;position: absolute;">K</SPAN>
<SPAN style="position: static;left:-9773px;position: absolute;">p</SPAN>
<SPAN style="position: static;left:-9326px;position: absolute;">B</SPAN>
<SPAN style="position: static;left:-9195px;position: absolutee;">r</SPAN>
<SPAN style="position: static;left:-9413px;position: absolute;">L</SPAN>
<SPAN style="position: static;left:-9196px;position: absolute;">l</SPAN>
<SPAN style="position: static;left:-9737px;position: absolute;">j</SPAN>
<SPAN style="position: static;left:-9897px;position: absolutee;">y</SPAN>
<SPAN style="position: static;left:-9014px;position: absolute;">V</SPAN>
<SPAN style="position: static;left:-9893px;position: absolute;">W</SPAN>
<SPAN style="position: static;left:-9103px;position: absolutee;"> </SPAN>
...
gicotxaoyenc:yehosrk,mutaryslts.tsitatkiisuoptcrebhttnihdoLfI
<span style="position: absolute; left: 241px; top: 0px;">g</span>
<span style="position: absolute; left: 263px; top: 0px;">i</span>
<span style="position: absolute; left: 117px; top: 0px;">c</span>
<span style="position: absolute; left: 17px; top: 20px;">o</span>
<span style="position: absolute; left: 276px; top: 0px;">t</span>
<span style="position: absolute; left: 287px; top: 0px;">x</span>
<span style="position: absolute; left: 129px; top: 0px;">a</span>
<span style="position: absolute; left: 9px; top: 20px;">o</span>
<span style="position: absolute; left: 13px; top: 0px;">y</span>
...
These texts were generated by a script (currently unavailable). It can do two things: Inflate (the first paragraph above) and Scramble (the second paragraph).

Using the Inflate option will add random characters in SPANs positioned absolutely between nine and ten thousand pixels to the left. When the user selects text, the random characters will also be selected, and when the text is copied, they will also be copied. It will randomly add up to five letters between every two characters, and it will ignore text in TEXTAREAs, SCRIPTs, and SELECTs. Such text could be "deflated" by copying the source, then removing all text that matches the regular expression /<span[a-z0-9 ;:="']*?>[A-Za-z]</span>/ , removing all SPAN tags that contain a letter and an attribute. Therefore, I put each original letter in a very similar SPAN tag, complete with a random location, and gave both types of spans position:absolute and position:static, in different orders. This could also be matched by a regular expression, but it would be much more complicated, and I will not list it here. It would also be possible to write a GreaseMonkey script that would loop through the SPANs and delete all of them which has a position attribute equal to absolute. However, it would probably be easier to retype it manually.

Using the Scramble option will put each letter in to a SPAN, position them absolutely at their correct location, and randomize the order. Therefore, when the user selects the text, the selection will be scrambled, and when it is pasted, it will show up as nonsense. Spaces are rendered pointless by the procedure, and are therefore removed. Scrambled text will not select cleanly, but is nearly impossible to descramble. It would be possible to write a GreaseMonkey script that would sort the SPANs by top, then by left, but it would be much easier to retype the text manually.

When using this approach, remember to position the text's container, or it will show up in unexpected places. In addition, this approach will completely break word wrap,and must therefore be placed in a container with a fixed width. This width should be entered in the Width textbox in the scrambler so that the text will flow correctly.

These methods will definitely prevent all but the most determined and technically skilled copiers. However, they do not prevent OCR screenreaders. This will be discussed in part 3.

On copy prevention in HTML, part 1

Migrated from my old blog; originally posted 4/8/2007

Many web developers like to prevent their viewers from copying their text. While I do not approve of this, there are cases where it is appropriate.

The simplest way to achieve this is to use the IE only attribute UNSELECTABLE and the FireFox only css style -moz-user-select. Such HTML looks like this:

<DIV unselectable="on"
style="-moz-user-select:none;">
You can't select me.
</DIV>
You can't select me.

To make the HTML and CSS validate, one could do this in Javascript: Elem.unselectable = "on"; Elem.style.MozUserSelect = "none";

However, this method only works in IE and Firefox. In addition, in IE, it doesn't work very well, and if a user tries hard, he will end up selecting the text.

A slightly better way to do it is to handle the onselectstart event (for IE) and the onmousedown event (for everything else) and return false. This will prevent the browser from handling the events. This results in something like this:

<DIV
onselectstart="return false;"
onmousedown="return false;" >
You can't select me.
</DIV>
You can't select me.

The problem with these methods is that they do nothing to prevent a user from reading the HTML source. This is discussed in the next part.