Showing posts with label HTML. Show all posts
Showing posts with label HTML. Show all posts

Modifying HTML strings using jQuery

jQuery makes it very easy to modify a DOM tree.  For example, to strip all hyperlinks (<a> tags) from an element, we can write (demo)

$(...).find('a[href]')
      .replaceWith(function() { return this.childNodes });

After getting used to this, one might want to use jQuery to modify HTML contained in a string.  Here, however, the naïve approach does not work:

var htmlSource = ...;
$(htmlSource).find('a[href]')
      .replaceWith(function() { return this.childNodes });

This code tries to remove all <a> tags from the HTML contained in the htmlSource string.  However, what it actually does is create a detached DOM tree containing the new elements, strip all <a> tags in those elements, and throw the whole thing away.  It doesn’t modify the original string.  In fact, since the  $ function only takes a reference to an immutable string, this approach cannot modify the original string.

Instead, you need to retrieve the source from the DOM tree after modifying it, then assign that source back to the variable. 

There is an additional subtlety with this approach.  jQuery cannot return the complete HTML source for a collection of elements.  Therefore, it is also necessary to wrap the HTML in a dummy element (typically a <div>).   One can then call .html() to get the innerHTML of the dummy element, which will contain exactly the desired content

This also eliminates the distinction between root-level elements and nested elements.  If the original HTML string contains root-level <a> elements (which aren’t nested in other tags), writing $(htmlSource).find('a') won’t find them, since .find() only searches the descendants of the elements in the jQuery object.  By wrapping the HTML in a dummy element, all of the elements in the original content become descendants, and can be returned by .find().

Here, therefore, is the correct way to modify an HTML string using jQuery:

var htmlSource = ...;
var tree = $("<div>" + htmlSource + "</div>");

tree.find('a[href]')
    .replaceWith(function() { return this.childNodes });

htmlSource = tree.html();

Animating Table Rows with jQuery

jQuery contains a powerful and flexible animation engine.  However, it has some limitations, primarily due to underlying limitations of CSS-based layout

For example, there is no simple way to slideUp() a table row (<tr> element).  The slideUp animation will animate the element’s height to zero.  However, a table row is always tall enough to show its elements, so the animation cannot actually shrink the element.

To work around this, we can wrap the contents of each cell in a <div> element, then slideUp() the <div> elements.  Doing this in the HTML would create ugly and non-semantic markup, so we can do it in jQuery instead.

For example: Demo

$('tr')
    .children('td, th')
    .animate({ padding: 0 })
    .wrapInner('<div />')
    .children()
    .slideUp(function() { $(this).closest('tr').remove(); });

Explanation:

  1. Get all of the cells in the row
  2. Animate away any padding in the cells
  3. Wrap all of the contents of each cell in one <div> element for each cell (calling wrapInner())
  4. Select the new <div> elements
  5. Slide up the <div>s, and remove the rows when finished.

If you don’t remove the rows, their borders will still be visible.  Therefore, if you want the rows to stay after the animation, call hide() instead of remove().

On copy prevention in HTML, part 3

Migrated from my old blog; originally posted 4/16/2007

My previous post stretched the limit of simple copy prevention. Beyond this point, it gets very complicated. Before continuing, some thought is in order. Who are you trying to prevent from copying your text? Why shouldn't the text be copied? Unless you are trying to stop a hardcore developer, the previous methods should suffice. Also, what kind of copying are you trying to prevent? If you are trying to prevent the copier from copying into a web page, it is significantly harder, because he can copy your source and it will display normally.

I can think of two ways to prevent the copier from using a screenreader to copy your text.

  1. Put all of the text into a single, CAPTCHA-like image. This way, the screenreader will not be able to read the text. However, this will also make it more difficult for legitimate people to read your text. Also, the copier could simply insert the large image as-is into his document. This risk could be mitigated by watermarking it with your name.
  2. Break apart the text into many different images, each one somewhat smaller than a letter. This could be done in server-side code. Then, use a JavaScript timer to alternate the images so that the screenreader will never see all of the text at once. To prevent the copier from modifying the JavaScript to show all of the images, alternate them with other images. For example, write a server-side script that takes X and Y coordinates, and a timer index. This script would either a white image, or the chunk of text at the given coordinates, depending on the timer index. To prevent the copier from using GDI to OR-blit screenshots from different times, (or from using Photoshop to make the white transparent, then pasting them together) the white could have random black patterns. However, this will make the text hard to read. If a JavaScript timer isn't fast enough to make the text legible, it could be done in Flash.
Preventing the copier from copying your content into HTML is more difficult. No matter how obscure your source is, the copier could use a tool like Firebug to copy your DOM source using the innerHTML property. Here too, there are several options.
  1. Use my Scrambler (see part 2), and put your name anywhere within the scrambled text. (for example, you could put in, in the middle, Written by Your Name Here; do not copy) It is virtually impossible for the copier to extricate the spans that form your name and then fill the resulting gap. This could also be done in an image. However, if you put your name at the end, the copier could position a white DIV to hide it. Or, if his name is similar in length, he could position a white DIV with his name to hide it.
  2. If you are only worried about part of your text, scramble the entire page. It would be extremely difficult for the copier to extract the sensitive part. For example, you could add a lengthy copyright header before the content, and scramble it with the content. However, the copier could position the containing DIV so that the copyright header is above the top of the DIV.
  3. Break the text into a large set of images, and use client-side JavaScript to execute a server-generated script that adds these images from another server-side script. For every request, require a single-use authorization token returned by the previous request. The initial page request would include the auth-token for the first script, and each image would be preceded by an AJAX request for auth-tokens for the image and for the next AJAX request. The first script would have an auth-token for the first AJAX request embedded within it. To make it more difficult for an attacker to get an auth-token from the initial script, you could encode the script before sending it, and decode on the client, then pass it to the eval() function. All responses would have the no-cache header to prevent the copier from taking the images out of the cache, and the images would be used by the background-image attribute on a DIV to prevent Save Image As (only necessary if Save Image As doesn't redownload the image from the server). The server would track auth-tokens in a database, and delete them when used. If you do all this, the only way the copier could get the images from the page would be Print Screen. To prevent that, make the images flicker as described earlier. The copier could, however, load your page with JavaScript disabled, download and decode the script, convert it to a full programming language (eg, C#), and use the script's embedded auth-token to send the "AJAX" requests over HTTP (for example, using .NET's HttpWebRequest class) and download the images. To prevent this, make the auth-tokens expire after about 30 seconds. If any auth-token is expired, all subsequent images should form something different. (maybe Service Unavailable, or random black pixels, or a different text) By the time the copier finishes writing his program, his "stolen" auth-token will have expired, and he will not know what he did wrong. To prevent him from trying again, you could blacklist his IP address after receiving an expired request, and embed IP address in auth-tokens. You could also set and require some innocuous-seeing cookies on the server for every request. The copier wouldn't notice these cookies, and when he requests the images without these cookies in his request, you could send him whatever you want. Please note that if your page requires a login, the copier probably will check cookies. And, if the copier is being paid by the hour (this is quite likely; otherwise, he'd give up), he might even thank you for doing all this.

On copy prevention in HTML, part 2

Migrated from my old blog; originally posted 4/16/2007

The methods discussed in my previous post are crude and ugly. Most of the time, they do work, but they do nothing to prevent the user from viewing the source and copying the text from there. Also, the user has a right to select text that should not be denied. For example, if one wants to show someone part of a large document, the easiest way to do that is to select the part.

ZSkTuKpBrLljyVW GmtoBbO MVocxRvoopy zKYtQahiDEsh LLtQexowSEtDnIg. NoyticDMe thiMaDVnZt, whGZenjEE ufapdeIPasBZxtgCeYWDd, iSt BlMNooks KPzRlkeeGifkshqdheodB tIVnoMNtEal nySouQnAqVsensegX. cUHHoNweqdcvecFGrU,PGZ pMibt rqcanrbKkn eHstilqTulOE beRPuv STwaQsyTePXvplRoCectxeKAjd jVpnXljoDYrDlrmaKlly.B IiLwokzofk at itsjw JXsCoIuuFhjrce:
<SPAN style="position: static;left:-9477px;position: absolute;">Z</SPAN>
<SPAN style="position: static;left:-9765px;position: absolute;">S</SPAN>
<SPAN style="position: static;left:-9586px;position: absolute;">k</SPAN>
<SPAN style="position: static;left:-9373px;position: absolutee;">T</SPAN>
<SPAN style="position: static;left:-9734px;position: absolute;">u</SPAN>
<SPAN style="position: static;left:-9872px;position: absolute;">K</SPAN>
<SPAN style="position: static;left:-9773px;position: absolute;">p</SPAN>
<SPAN style="position: static;left:-9326px;position: absolute;">B</SPAN>
<SPAN style="position: static;left:-9195px;position: absolutee;">r</SPAN>
<SPAN style="position: static;left:-9413px;position: absolute;">L</SPAN>
<SPAN style="position: static;left:-9196px;position: absolute;">l</SPAN>
<SPAN style="position: static;left:-9737px;position: absolute;">j</SPAN>
<SPAN style="position: static;left:-9897px;position: absolutee;">y</SPAN>
<SPAN style="position: static;left:-9014px;position: absolute;">V</SPAN>
<SPAN style="position: static;left:-9893px;position: absolute;">W</SPAN>
<SPAN style="position: static;left:-9103px;position: absolutee;"> </SPAN>
...
gicotxaoyenc:yehosrk,mutaryslts.tsitatkiisuoptcrebhttnihdoLfI
<span style="position: absolute; left: 241px; top: 0px;">g</span>
<span style="position: absolute; left: 263px; top: 0px;">i</span>
<span style="position: absolute; left: 117px; top: 0px;">c</span>
<span style="position: absolute; left: 17px; top: 20px;">o</span>
<span style="position: absolute; left: 276px; top: 0px;">t</span>
<span style="position: absolute; left: 287px; top: 0px;">x</span>
<span style="position: absolute; left: 129px; top: 0px;">a</span>
<span style="position: absolute; left: 9px; top: 20px;">o</span>
<span style="position: absolute; left: 13px; top: 0px;">y</span>
...
These texts were generated by a script (currently unavailable). It can do two things: Inflate (the first paragraph above) and Scramble (the second paragraph).

Using the Inflate option will add random characters in SPANs positioned absolutely between nine and ten thousand pixels to the left. When the user selects text, the random characters will also be selected, and when the text is copied, they will also be copied. It will randomly add up to five letters between every two characters, and it will ignore text in TEXTAREAs, SCRIPTs, and SELECTs. Such text could be "deflated" by copying the source, then removing all text that matches the regular expression /<span[a-z0-9 ;:="']*?>[A-Za-z]</span>/ , removing all SPAN tags that contain a letter and an attribute. Therefore, I put each original letter in a very similar SPAN tag, complete with a random location, and gave both types of spans position:absolute and position:static, in different orders. This could also be matched by a regular expression, but it would be much more complicated, and I will not list it here. It would also be possible to write a GreaseMonkey script that would loop through the SPANs and delete all of them which has a position attribute equal to absolute. However, it would probably be easier to retype it manually.

Using the Scramble option will put each letter in to a SPAN, position them absolutely at their correct location, and randomize the order. Therefore, when the user selects the text, the selection will be scrambled, and when it is pasted, it will show up as nonsense. Spaces are rendered pointless by the procedure, and are therefore removed. Scrambled text will not select cleanly, but is nearly impossible to descramble. It would be possible to write a GreaseMonkey script that would sort the SPANs by top, then by left, but it would be much easier to retype the text manually.

When using this approach, remember to position the text's container, or it will show up in unexpected places. In addition, this approach will completely break word wrap,and must therefore be placed in a container with a fixed width. This width should be entered in the Width textbox in the scrambler so that the text will flow correctly.

These methods will definitely prevent all but the most determined and technically skilled copiers. However, they do not prevent OCR screenreaders. This will be discussed in part 3.

On copy prevention in HTML, part 1

Migrated from my old blog; originally posted 4/8/2007

Many web developers like to prevent their viewers from copying their text. While I do not approve of this, there are cases where it is appropriate.

The simplest way to achieve this is to use the IE only attribute UNSELECTABLE and the FireFox only css style -moz-user-select. Such HTML looks like this:

<DIV unselectable="on"
style="-moz-user-select:none;">
You can't select me.
</DIV>
You can't select me.

To make the HTML and CSS validate, one could do this in Javascript: Elem.unselectable = "on"; Elem.style.MozUserSelect = "none";

However, this method only works in IE and Firefox. In addition, in IE, it doesn't work very well, and if a user tries hard, he will end up selecting the text.

A slightly better way to do it is to handle the onselectstart event (for IE) and the onmousedown event (for everything else) and return false. This will prevent the browser from handling the events. This results in something like this:

<DIV
onselectstart="return false;"
onmousedown="return false;" >
You can't select me.
</DIV>
You can't select me.

The problem with these methods is that they do nothing to prevent a user from reading the HTML source. This is discussed in the next part.