I recently posted about a bot-safe email link where I showed how you can use a series of obfuscated method calls through JavaScript to pop open a mail client with an address in the TO field, while still maintaining a decent sense of privacy. In this post we will dive into how the actual obfuscation script works and how it can be extended.
I show how I use it in the bot-safe email link but the structure of the script is a good foundation for any JavaScript obfuscation needs.
Generating Function Names
The core part of the system comes from the generation of random numbers and letters. Because of this we will use the following extensively:
/* * Generates a random letter, either upper or lower case. */ function randomLetter() { // Generate a letter between A-Z (Ascii 65-90) or a-z (Ascii 97-122). We // will randomly select between upper and lower case. if (Math.round(Math.random() * 1) == 1) { return String.fromCharCode(97 + Math.round(Math.random() * 25)); } return String.fromCharCode(65 + Math.round(Math.random() * 25)); } /* * Generates a random number between 0 and 9. */ function randomNumber() { return Math.round(Math.random() * 9); }
These two functions are the enabling factor behind our method name generator:
/* * Generates a random function name that always begins with a letter and * is composed entirely of random numbers and letters and is of random * length. */ function randomFunctionName() { // Max function name length var maxLen = 25; // Generate a random number - this will be the length of our method. We // will subtract one right off the bat to account for the first letter // in the method. var len = Math.round(Math.random() * maxLen) - 1; // Generate a random letter - functions can't start with a number so we // will always start with a letter. var functionName = randomLetter(); // Loop through and create the method name. for (i = 0;i < len;i++) { // Randomly select between a letter and a number if (Math.round(Math.random() * 1) == 1) { functionName += randomLetter(); } else { functionName += randomNumber(); } } return functionName; }
Now a simple call to randomFunctionName() will give a completely randomized, yet valid, function name between 1 and 25 characters in length. We will use this extensively throughout the rest of the script.
Generating Methods to Obfuscate the Email Address
Next we need to write a method to randomly parse through the email address, splitting into sections that can be pushed into their own methods. In addition, we will need to vary up the methods by which the method is created. We will do this by going through a series of loops which split portions of the text out of the mail string, generate a function, and have the portion of the string return from the function in one way or another.
This method is a bit involved, so we’ll just take a look at it then step through the parts of importance. Please note that I have removed comments, spaces, and other things so that it will render better. For the full code visit the actual script and view source.
function generate(addr) { var loc = "mailto:" + addr; // This will keep track of all the generated javascript code var functions = ""; var methodCalls = ""; var maxSplit = 3; var spos = 0; var epos = 0; var splitLen = 0; // This is the number of method aggregations we want - min of 1, max of 3. var aggregations = Math.round(Math.random() * 2) + 1; // This holds a set of function names that may or may not be aggregated // into a single call through another method. var functionSubset = ""; // Split up the string into multiple parts in order to start breaking // it up into multiple methods. Each one of these will become a part // of the main method call to recompose the email address. var cnt = 0; do { cnt++; // If the length from the last split to the end of the string is // less than the max split then we will want to use that length // for the next split. if (epos > -1 && loc.substring(epos).length <= maxSplit) { splitLen = loc.substring(epos).length } else { splitLen = maxSplit; } // Determine the length for the first split splitLen = Math.round(Math.random() * splitLen); // Grab the split spos = epos; epos = spos + splitLen; var str = loc.substring(spos, epos); // Create a function name and add it to the list of functions that // must be called by the mail method. var functionName = randomFunctionName(); if (functionSubset.length > 0) { functionSubset += " + "; } functionSubset += functionName + "()"; functions += "\nfunction " + functionName + functionContents(str); // Tracks whether we need to reset the aggregation variables or not var resetVars = false; // If there is only one aggregation then we will just add the method // to the emailMe() method call list. Also check to see if we have // aggregated enough to chop and create a method to hold the functions // we have created thus far. if (aggregations == 1 || epos >= loc.length) { // Add the function call to the emailMe() method. if (methodCalls.length > 0) { methodCalls += " + "; } methodCalls += functionSubset; resetVars = true; } else if (cnt == aggregations) { // Get a function name for the aggregate function. var aggregateFunctionName = randomFunctionName() + "()"; // Add the aggregate function to the emailMe() method call. if (methodCalls.length > 0) { methodCalls += " + "; } methodCalls += aggregateFunctionName; // Add the actual function functions += "\nfunction " + aggregateFunctionName + " { return " + functionSubset + ";}"; resetVars = true; } if (resetVars) { // Reset the cnt and functionSubset and figure a random aggregations // number cnt = 0; aggregations = Math.round(Math.random() * 2) + 1; functionSubset = ""; } } while (epos >= 0 && epos < loc.length); functions = "// Remove this before placing code on page >>>\nemailMe();\n// <<< Remove this before placing code on page\n\n\n" + "function emailMe() { window.location = " + methodCalls + "; }" + functions; document.frm.code.value = functions; }
The methodCalls variable on line 6 simply accumulates the calls that will be made from the emailMe() method. That is, the emailMe() method will ultimately contain “return ” + methodCalls, as shown in the code block below.
The maxSplit variable on line 8 is used to determine how many characters are the maximum number allowed in each substring. The number 3 was settled upon because it generally ensures that the entire email address cannot be in the same substring and it guarantees that the mailto: token will be broken into at least two parts.
The aggregations variable on line 15 signifies the number of functions that should be aggregated under a single function. If this were set to 1 on each iteration then the result would be that the emailMe() method would call each function, adding the results of them all together to result in the email address:
function emailMe() { return s1() + s2() + s3(); }
By having this evaluate to higher than one on some iterations, portions of the generated functions will be delegated to other method calls, making the emailMe() method need to call fewer methods directly:
function emailMe() { return s1() + a1(); } function a1() { return s2() + s3(); }
Our loop to gather substrings actually starts on line 25 and will continue until the end of the string is reached. Within this loop the actual parsing and function generation takes place. Lines 31-38 determine how long the next substring should be, taking into account the remaining length of the string. A random function name is then generated and the function is actually generated by a call to functionContents() on line 53, which we will go into with more detail shortly. The actual function itself is placed within the functions variable for storage. The generated function names are added to a functionSet variable that simply contains a list of function calls. These will either be placed into an aggregate method or in the emailMe() method, as discussed above. The code determines whether to add the function to an aggregate method, keep accumulating, or add it to the emailMe() method in lines 61-83.
Finally, the last two lines of the function enable retrieval and testing of the generated code. The first line creates a call to emailMe() that resides outside a function so when the block of code is sent through the eval() function something actually happens. The second line of code simply places the entire thing into the text area for viewing, editing, and copying.
Generating Function Contents
Now that we have broken the pieces of our mailto string up into many substrings and generated a function for each the next step is to fill in the contents of each function. I chose to do this by having the actual loop itself call out to the functionContents(str) method, which then delegates off to a number of different techniques for returning the string. The technique used is chosen at random, making each call to the functionContents(str) method capable of returning a different result.
Each of the methods is illustrative in nature and doesn’t really do anything spectacular, aside from making the end product harder to read and parse with the human mind.
Here is the code:
/* * Generates the contents of each function by using one of a few random * techniques for obfuscating the internal data. Every technique used * here must resolve back to a string if the function itself is called. */ function functionContents(str) { var technique = Math.round(Math.random() * 2); var f = "() {" switch (technique) { case 0: f += escapeFunctionContents(str); break; case 1: f += staticFunctionContents(str); break; case 2: f += evalFunctionContents(str); break; } f += "}"; return f; } function escapeFunctionContents(str) { return "return unescape(\"" + escape(str) + "\");"; } function evalFunctionContents(str) { return "return eval(\"if (true) '" + str + "';\")"; } /* * One of a few ways the return values of a function is formulated. This * method will simply return the passed string as a string. */ function staticFunctionContents(str) { return "return \"" + str + "\";"; }
The first technique escapes the string in our script and uses the unescape() method to return it to normal when the generated script is executed. This will typically not have much of an effect unless there are special characters in the email address – this method was used more for illustrative purposes than anything else.
The second technique uses the eval() function to evaluate a string. In our case I simply have it evaluate an if statement that always returns true, resulting in the eval() method returning the product of the true if statement, our substring.
The third technique simply returns the substring itself.
The techniques for generating the function bodies can be extended by simply adding a new function then updating the functionContents(str) method to take it into account as an option.
The GUI
We have the completed code and just need a way to launch and test it. I created a simple form that invokes the generate(str) method and places the output into a text area in a form that it can be executed through an eval() statement by clicking on the Test button.
<script> // All the stuff we went over above... </script> <h1>McDonaldLand</h1> <h3>Get all your info from McDonaldLand<br/><a href="http://www.mcdonaldland.info">www.McDonaldLand.info</a></h3> <form name="frm"> Enter the email address to be obfuscated: <input type="text" name="addr" value=""/><input type="button" value="Generate Code" onClick="generate(this.form.addr.value)"/><br/><br/> <textarea rows="20" cols="60" name="code"></textarea><br/> <input type="button" onClick="eval(this.form.code.value)" value="Test"/> </form>
The Test
You can either follow along with the post (I didn’t test this one out) or you can copy the working script by viewing source (I tested this thoroughly). Either way the end result should be a page that will allow you to enter an email address, generate script from it, then test the script, all from the same page.
Fixed the link. Sorry about that.
Hi
Your link to the complete script code (http://www.mcdonaldland.info/emailobfuscator) does not work.
Would sppreciate the complete code.
Regards
Peter
JavaScript obfuscation will not help in case of OCR harvesters but the regular HTML-scanning bots would be troubled. Even so, that would equal to using a JavaScript-based encoding as a bot scanning the code, wouldn’t render it.
There’s Mac OS X Dashboard widget called obfuscatr that uses this kind of logic. See the details at flash tekkie.
obfuscatr was also featured in MacWorld Italy of March 2008.