Writing A JavaScript Obfuscator

I recently posted about a bot-safe email link where I showed how you can use a series of obfuscated method calls through JavaScript to pop open a mail client with an address in the TO field, while still maintaining a decent sense of privacy. In this post we will dive into how the actual obfuscation script works and how it can be extended.

I show how I use it in the bot-safe email link but the structure of the script is a good foundation for any JavaScript obfuscation needs.

Generating Function Names
The core part of the system comes from the generation of random numbers and letters. Because of this we will use the following extensively:

/*
 * Generates a random letter, either upper or lower case.
 */
function randomLetter() {
	// Generate a letter between A-Z (Ascii 65-90) or a-z (Ascii 97-122). We
	// will randomly select between upper and lower case.
	if (Math.round(Math.random() * 1) == 1) {
		return String.fromCharCode(97 + Math.round(Math.random() * 25));
	}

	return String.fromCharCode(65 + Math.round(Math.random() * 25));
}

/*
 * Generates a random number between 0 and 9.
 */
function randomNumber() {
	return Math.round(Math.random() * 9);
}

These two functions are the enabling factor behind our method name generator:

/*
 * Generates a random function name that always begins with a letter and
 * is composed entirely of random numbers and letters and is of random
 * length.
 */
function randomFunctionName() {

	// Max function name length
	var maxLen = 25;

	// Generate a random number - this will be the length of our method. We
	// will subtract one right off the bat to account for the first letter
	// in the method.
	var len = Math.round(Math.random() * maxLen) - 1;

	// Generate a random letter - functions can't start with a number so we
	// will always start with a letter.
	var functionName = randomLetter();

	// Loop through and create the method name.
	for (i = 0;i < len;i++) {
		// Randomly select between a letter and a number
		if (Math.round(Math.random() * 1) == 1) {
			functionName += randomLetter();
		} else {
			functionName += randomNumber();
		}
	}

	return functionName;
}

Now a simple call to randomFunctionName() will give a completely randomized, yet valid, function name between 1 and 25 characters in length. We will use this extensively throughout the rest of the script.

Generating Methods to Obfuscate the Email Address
Next we need to write a method to randomly parse through the email address, splitting into sections that can be pushed into their own methods. In addition, we will need to vary up the methods by which the method is created. We will do this by going through a series of loops which split portions of the text out of the mail string, generate a function, and have the portion of the string return from the function in one way or another.

This method is a bit involved, so we’ll just take a look at it then step through the parts of importance. Please note that I have removed comments, spaces, and other things so that it will render better. For the full code visit the actual script and view source.

function generate(addr) {
	var loc = "mailto:" + addr;

	// This will keep track of all the generated javascript code
	var functions = "";
	var methodCalls = "";

	var maxSplit = 3;

	var spos = 0;
	var epos = 0;
	var splitLen = 0;

    // This is the number of method aggregations we want - min of 1, max of 3.
    var aggregations = Math.round(Math.random() * 2) + 1;

    // This holds a set of function names that may or may not be aggregated
    // into a single call through another method.
    var functionSubset = "";

    // Split up the string into multiple parts in order to start breaking
    // it up into multiple methods. Each one of these will become a part
    // of the main method call to recompose the email address.
    var cnt = 0;
    do {
        cnt++;

        // If the length from the last split to the end of the string is
        // less than the max split then we will want to use that length
        // for the next split.
        if (epos > -1 && loc.substring(epos).length <= maxSplit) {
            splitLen = loc.substring(epos).length
        } else {
            splitLen = maxSplit;
        }

        // Determine the length for the first split
        splitLen = Math.round(Math.random() * splitLen);

        // Grab the split
        spos = epos;
        epos = spos + splitLen;
        var str = loc.substring(spos, epos);

        // Create a function name and add it to the list of functions that
        // must be called by the mail method.
        var functionName = randomFunctionName();
        if (functionSubset.length > 0) {
            functionSubset += " + ";
        }
        functionSubset += functionName + "()";

        functions += "\nfunction " + functionName + functionContents(str);

        // Tracks whether we need to reset the aggregation variables or not
        var resetVars = false;

        // If there is only one aggregation then we will just add the method
        // to the emailMe() method call list. Also check to see if we have
        // aggregated enough to chop and create a method to hold the functions
        // we have created thus far.
        if (aggregations == 1 || epos >= loc.length) {
            // Add the function call to the emailMe() method.
            if (methodCalls.length > 0) {
                methodCalls += " + ";
            }
            methodCalls += functionSubset;
            resetVars = true;
        } else if (cnt == aggregations) {
            // Get a function name for the aggregate function.
            var aggregateFunctionName = randomFunctionName() + "()";

            // Add the aggregate function to the emailMe() method call.
            if (methodCalls.length > 0) {
                methodCalls += " + ";
            }
            methodCalls += aggregateFunctionName;

            // Add the actual function
            functions += "\nfunction " + aggregateFunctionName + " { return " + functionSubset + ";}";

            resetVars = true;
        }

        if (resetVars) {
            // Reset the cnt and functionSubset and figure a random aggregations
            // number
            cnt = 0;
            aggregations = Math.round(Math.random() * 2) + 1;
            functionSubset = "";
        }

    } while (epos >= 0 && epos < loc.length);

    functions = "// Remove this before placing code on page >>>\nemailMe();\n// <<< Remove this before placing code on page\n\n\n"
                + "function emailMe() { window.location = " + methodCalls + "; }" + functions;

	document.frm.code.value = functions;
}

The methodCalls variable on line 6 simply accumulates the calls that will be made from the emailMe() method. That is, the emailMe() method will ultimately contain “return ” + methodCalls, as shown in the code block below.

The maxSplit variable on line 8 is used to determine how many characters are the maximum number allowed in each substring. The number 3 was settled upon because it generally ensures that the entire email address cannot be in the same substring and it guarantees that the mailto: token will be broken into at least two parts.

The aggregations variable on line 15 signifies the number of functions that should be aggregated under a single function. If this were set to 1 on each iteration then the result would be that the emailMe() method would call each function, adding the results of them all together to result in the email address:

function emailMe() {
    return s1() + s2() + s3();
}

By having this evaluate to higher than one on some iterations, portions of the generated functions will be delegated to other method calls, making the emailMe() method need to call fewer methods directly:

function emailMe() {
    return s1() + a1();
}

function a1() {
    return s2() + s3();
}

Our loop to gather substrings actually starts on line 25 and will continue until the end of the string is reached. Within this loop the actual parsing and function generation takes place. Lines 31-38 determine how long the next substring should be, taking into account the remaining length of the string. A random function name is then generated and the function is actually generated by a call to functionContents() on line 53, which we will go into with more detail shortly. The actual function itself is placed within the functions variable for storage. The generated function names are added to a functionSet variable that simply contains a list of function calls. These will either be placed into an aggregate method or in the emailMe() method, as discussed above. The code determines whether to add the function to an aggregate method, keep accumulating, or add it to the emailMe() method in lines 61-83.

Finally, the last two lines of the function enable retrieval and testing of the generated code. The first line creates a call to emailMe() that resides outside a function so when the block of code is sent through the eval() function something actually happens. The second line of code simply places the entire thing into the text area for viewing, editing, and copying.

Generating Function Contents
Now that we have broken the pieces of our mailto string up into many substrings and generated a function for each the next step is to fill in the contents of each function. I chose to do this by having the actual loop itself call out to the functionContents(str) method, which then delegates off to a number of different techniques for returning the string. The technique used is chosen at random, making each call to the functionContents(str) method capable of returning a different result.

Each of the methods is illustrative in nature and doesn’t really do anything spectacular, aside from making the end product harder to read and parse with the human mind.

Here is the code:

/*
 * Generates the contents of each function by using one of a few random
 * techniques for obfuscating the internal data. Every technique used
 * here must resolve back to a string if the function itself is called.
 */
function functionContents(str) {
    var technique = Math.round(Math.random() * 2);

    var f = "() {"

    switch (technique) {
        case 0:
            f += escapeFunctionContents(str);
            break;
        case 1:
            f += staticFunctionContents(str);
            break;
        case 2:
            f += evalFunctionContents(str);
            break;
    }

    f += "}";
    return f;
}

function escapeFunctionContents(str) {
    return "return unescape(\"" + escape(str) + "\");";
}

function evalFunctionContents(str) {
    return "return eval(\"if (true) '" + str + "';\")";
}

/*
 * One of a few ways the return values of a function is formulated. This
 * method will simply return the passed string as a string.
 */
function staticFunctionContents(str) {
	return "return \"" + str + "\";";
}

The first technique escapes the string in our script and uses the unescape() method to return it to normal when the generated script is executed. This will typically not have much of an effect unless there are special characters in the email address – this method was used more for illustrative purposes than anything else.

The second technique uses the eval() function to evaluate a string. In our case I simply have it evaluate an if statement that always returns true, resulting in the eval() method returning the product of the true if statement, our substring.

The third technique simply returns the substring itself.

The techniques for generating the function bodies can be extended by simply adding a new function then updating the functionContents(str) method to take it into account as an option.

The GUI
We have the completed code and just need a way to launch and test it. I created a simple form that invokes the generate(str) method and places the output into a text area in a form that it can be executed through an eval() statement by clicking on the Test button.

<script> // All the stuff we went over above... </script>

<h1>McDonaldLand</h1>
<h3>Get all your info from McDonaldLand<br/><a href="http://www.mcdonaldland.info">www.McDonaldLand.info</a></h3>
<form name="frm">
	Enter the email address to be obfuscated:
    <input type="text" name="addr" value=""/><input type="button" value="Generate Code" onClick="generate(this.form.addr.value)"/><br/><br/>

    <textarea rows="20" cols="60" name="code"></textarea><br/>
    <input type="button" onClick="eval(this.form.code.value)" value="Test"/>
</form>

The Test
You can either follow along with the post (I didn’t test this one out) or you can copy the working script by viewing source (I tested this thoroughly). Either way the end result should be a page that will allow you to enter an email address, generate script from it, then test the script, all from the same page.


A Bot-Safe Email Link

The quest for bot-safe email is one of the many holy grails of web development. There are many different ways to achieve this, each with their own merits and shortcomings.

Some people suggest using ascii character codes, like this:

<a href="&#109&#97&#105&#108&#116&#111&#58&#97&#98&#99&#64&#97&#98&#99&#46&#99&#111&#109> &#109&#97&#105&#108&#116&#111&#58&#97&#98&#99 &#64&#97&#98&#99&#46&#99&#111&#109</a>

Which, when rendered in the browser gives the address ‘mailto:abc@abc.com’. The obvious shortcoming of this approach is that browsers have built in tools for decoding such strings and browser supported scripting languages, such as JavaScript, have simple methods that will make the string intelligible. Take for example, the same string run through a simple call to the unescape(str) JavaScript method:

<a href="
 javascript:alert(
    unescape(
    '&#109&#97&#105&#108&#116&#111&#58&#97&#98&#99&#64&#97&#98&#99&#46&#99&#111&#109'
    ));">Click me to show 'mailto:abc@abc.com</a>

Others use images to make it harder for bots to read the addresses. We’ve all seen this – you see an email address and go to copy it only to find that it will neither copy into a text form nor will it allow you to click on it. The end result is that the site makes the user type the email address in manually, while looking at the web site. This may work fine for bill@microsoft.com however it is hardly suitable for mukesh.radadeshish@iborrcat.state.us.gov. If you are like me you would readily send Bill email, but Mukesh would often be left in the dark – its just too much work to transcribe all that.

The method I have been using lately simply uses a series of obfuscated methods to place the ‘mailto:abc@abc.com’ in the window’s location bar, causing the browser to pop open the default client with the email address already implanted. For example, the abc@abc.com address would be represented by this:

function emailMe() { window.location = Y4312t7g1o6010685() + fl8CyEyI2jg() + D97ts() + P55j7BkgWM0O34281i2w() + Oo7j() + jRt208Km3G2aYM29GtMf() + BSCL2kW(); }
function nG7DOf6h448mG() {return "m";}
function JFb402B0GN33Ybie4() {return eval("if (true) 'ai';")}
function Y4312t7g1o6010685() { return nG7DOf6h448mG() + JFb402B0GN33Ybie4();}
function nk0t4313DtfU28b8K16U2() {return "lt";}
function Qi() {return "o:";}
function fl8CyEyI2jg() { return nk0t4313DtfU28b8K16U2() + Qi();}
function zdPUN0S37vx9x2() {return unescape("ab");}
function I6OD5jYkoOZF8fVLy8mELQ() {return unescape("");}
function D97ts() { return zdPUN0S37vx9x2() + I6OD5jYkoOZF8fVLy8mELQ();}
function r5KyhM() {return unescape("c@");}
function d53nO3EgQr2N() {return "a";}
function P55j7BkgWM0O34281i2w() { return r5KyhM() + d53nO3EgQr2N();}
function UR5() {return eval("if (true) '';")}
function i542HIf4g2q164245i() {return "bc";}
function Oo7j() { return UR5() + i542HIf4g2q164245i();}
function jRt208Km3G2aYM29GtMf() {return eval("if (true) '.';")}
function BSCL2kW() {return unescape("com");}

A call to the emailMe() method would then pop the mail client with abc@abc.com in the TO address field. I use a script to generate and test this for any email address. I’ll have another post in the future that outlines how I went about writing this generator.

The obvious benefit of this method is that it takes a lot more work to determine the results than it would to simply decode the ascii string or to implement a pre-packaged image reader to extract the address from within an image. To automate the reading of this technique it would require that the script load up all the javascript on the page then execute it. This complexity is then compounded by the fact that the bot has no way of knowing what the outcome of calling a particular function would be, meaning that it could have unintended side effects on the bot itself.


Email From Name

How many times have you seen this in your inbox?

It need not be so cryptic. Most of the time when you see this it is simply because the practitioners of site email simply do not know how to make their generic email address show up any other way. By default, most modern mail clients will simply show the first part of the email address if only an address is given in the from attribute on the email.

So this will show up just as you see in the image above:

FROM: support@mcdonaldland.info

However, by formatting the from address in an email friendly way:

FROM: McDonaldLand support <support@mcdonaldland.info>

We will now see this:

crystalclear.jpg

This will work for any field that takes an email address – from, to, cc, etc. – however not all mail clients pay attention to it for anything but the from attribute. If you supply it and an email client doesn’t support this format for a particular field, you will typically just see the email address, which allows everything to still work correctly. Because of this it is generally a good idea to always supply addresses in this format when you can.