URL Rewriting in ASP.NET

URL rewriting is a technique in which an incoming request to a web server is intercepted and a different resource is returned than the one requested. It's usually done for one of two reasons:

  • Resources Have Moved: If web pages have been moved, but the old URLs are likely to be still used, either because users have bookmarked pages or because the pages have been indexed by search engines, then URL rewriting will allow the old URLs to still work.

  • Simplifying URLs: URL rewriting can create simplified URLs that are easier for users to use and remember, even though the underlying URLs might be more complex. Additionally, search engines used to skip indexing for web pages that accepted URL parameters, so URL rewriting allowed the use of URLs that incorporated key parameters into the name of the requested resource.

The most common use of URL rewriting probably appears in content management systems, as typified by the two URLs below:

  • http://www.mywebsite.com/article.aspx?aid=24

  • http://www.mywebsite.com/content/24.aspx

The top URL represents a web page that shows the content of articles, with the specific article shown being determined by a URL parameter. The second URL shows the same article, but there is no URL parameter.

In fact, the web page "24.aspx" doesn't even exist. The request for the web page is intercepted and translated into the first URL, so it looks to the users as if the requested web page actually exists. In the old days, when search engines didn't index pages that accepted URL parameters, this technique ensured that the article could be indexed. Today, it just provides a simpler user experience. In fact, it can provide an alternate navigation mechanism for a web site, with users hacking the URL to view different articles.

So, how do you do URL rewriting in ASP.NET? There are three ways to do it, each of which has different strengths and weaknesses. These three ways are summarized below:

  1. URL Mapping: ASP.NET provides functionality within the web.config file that allows one resource to be returned in place of a requested one. It's not overly useful because the URL mapping mechanism in ASP.NET 2.0 doesn't support wildcards.

  2. Custom HTTP Handler: Provide an HTTP Handler that will intercept the requested resource and substitute the new one. This is pretty easy, but may not work for web sites hosted commercially, since the technique requires security permissions.

  3. Custom HTTP Module: Provide an HTTP Module and have the module perform the resource substitution. This technique will work for hosted web sites, but is a little more complex than the HTTP Handler method. It can also have some ramifications for other ASP.NET features.

Each of these techniques is described in the following sections.

URL Mapping

Beginning with ASP.NET, Microsoft added a capability for URL mapping to the web.config file for ASP.NET web sites. An example configuration is shown below:

<system.web>
    <urlMappings enabled="true">
        <add url="~/widgets.aspx" mappedUrl="~/products/w/widgets.aspx" />
    </urlMappings>
</system.web>

This capability permits a URL to be mapped to another URL, which works fine if you have a finite list of URLs to be mapped. Since this technique doesn't support wildcards, it doesn't work at all for open-ended mappings, such as would be needed for to map URLs like "24.aspx" to a web page that displays articles. Every time a new article was created, you'd have to add a new entry to the file.

Improvements to this mapping capability are planned to coincide with the advent of Windows Vista (formerly known as Longhorn), the next version of Windows. It's likely that ASP.NET 3.0 will include support for wildcards. If so, that may make the need for URL rewriting via HTTP handlers and HTTP modules go away.

Until then, the current edition of URL mapping is almost useless. Onwards to the other techniques!

Understanding How Requests Are Processed

To understand custom HTTP handlers, a little background on how ASP.NET processes web pages is needed. A request for a resource first goes to the IIS web server. Based on the extension, the request is forwarded to the proper ISAPI (Internet Server API) extension, which is responsible for returning the data necessary to answer the request. For pages ending with ".aspx", IIS is configured to dispatch the request to aspnet_isapi.dll.

Next, ASP.NET initializes a series of HTTP modules. The request is sent to all of the configured HTTP modules, each of which can manipulate the request as needed. Finally, the request will be sent to an HTTP handler or HTTP handler factory, ultimately causing the resource to be rendered and sent back to the requester.

This sequence of events provides two points where the request can be intercepted and a new resource substituted for the requested one: at the HTTP module level and the HTTP handler level.

Creating a Custom HTTP Handler

After initializing the HTTP modules, ASP.NET looks for the appropriate HTTP handler or HTTP handler factory to invoke to handle the request. To accomplish URL rewriting, simply create a custom HTTP handler factory by creating a class that implements the IHttpHandlerFactory interface. This interface requires that the class provide implementations for the GetHandler and ReleaseHandler methods.

The GetHandler method receives a number of parameters that are useful with regard to URL rewriting, including an HttpContext object that provides access to request-related information. In addition, the method also receives the requested URL, the physical path of the requested file, etc.

Ultimately, the GetHandler method can make changes to accomplish URL rewriting, as shown in Listing 1. Assuming an incoming URL with an emdedded article ID, such as "24.aspx", the URL is parsed to get the article ID. The article ID is stored in the Items hash, which is a little-known mechanism for storing data that can be accessed throughout the duration of a request. The ReWritePath method of the HttpContext object is used to change the resource that will ultimately be returned.

Finally, the method has to return an instance of the IHttpHandler class. To do this, it calls the GetCompiledPageInstance method with the new URL and new path; this method conveniently returns an IHttpHandler instance for the desired resource.

Listing 1: The SR_HandlerFactory Class


Imports System
Imports System.IO
Imports System.Web
Imports System.Web.UI
 
Namespace SunriseXP.Core
 
public Class SR_HandlerFactory : Implements IHttpHandlerFactory
 
   overridable overloads FUNCTION GetHandler(context AS HttpContext, _
         requestType AS String, url AS String, pathTranslated AS String) _
         AS IHttpHandler IMPLEMENTS IhttpHandlerFactory.GetHandler
 
      context.Items("fileName") = Path.GetFileNameWithoutExtension(url).ToLower()
     
      DIM strNewURL AS String = "/content.aspx"
      DIM strNewPath AS String = context.Server.MapPath("~/content.aspx")
 
      context.RewritePath(strNewURL)
 
      return PageParser.GetCompiledPageInstance(strNewURL, strNewPath, context)
   END FUNCTION
 
   overridable overloads SUB ReleaseHandler(handler AS IHttpHandler) _
         IMPLEMENTS IhttpHandlerFactory.ReleaseHandler
      '--- Does nothing
   END SUB
 
End Class
 
End Namespace

Of course, it's not enough to have a class that implements the IHttpHandlerFactory interface. You also have to tell ASP.NET to use the class. This is done by adding some configuration lines to the web.config file, as shown below:

<?xml version="1.0" encoding="UTF-8" ?>
<configuration>
   <system.web>
      <httpHandlers>
         <add verb="*" 
              path="*/content/*.aspx" 
              type="SunriseXP.Core.SR_PageHandlerFactory, SunriseXP.Core" />
      </httpHandlers>
   </system.web>
</configuration>

To configure a new HTTP handler, add an <httpHandlers> section to the web.config file. Within that set of tags, define the URL pattern that identifies which web pages the handler should be used for and which handler should be used. In this case, the "verb" attribute signifies whether the handler will handle GET, POST or other types of requests for the resources. The "*" value indicates that the handler will always be called regardless of the type of request.

The "path" attribute defines the pattern that should be matched, i.e. - every web page with a ".aspx" extension found in a folder named "content". Note that the handler could be configured to match any extension, but the handler will only be called if the IIS web server is configured to pass requests for the extension to ASP.NET.

Finally, the "type" attribute defines the class and the location of the class for the handler to be called for matching requests.

Now that the handler factory class had been created, and the configuration set up so that the class will be called, let's look at the sample destination web page shown in Listing 2. This page accesses the Items hash to retrieve the article ID that was stored by the GetHandler method. In real life, the web page would validate the article ID and then display either the requested article or an appropriate error message.

Listing 2: A Sample Content Web Page


<%@ Page Language="vb" %>
 
<script runat="server" language="vb">    
   SUB Page_Init()
      lblMessage.Text = HttpContext.Current.Items("fileName")
   END SUB
</script>
 
<html>
<head>
   <title>Test Content Page</title>
   <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
   <link rel="stylesheet" type="text/css" href="/include/keenertech.css">
</head>
<body>
 
<p class=ws_text>The message was: <asp:Label runat="server" ID="lblMessage" />
 
</body>
</html>

This is a simple and useful mechanism for implementing URL rewriting in ASP.NET. Unfortunately, it doesn't work in all environments. Specifically, if your web site is hosted by a commercial hosting service, I can assure you that this solution will not work. The solution requires a security permission that is not likely to be granted in a shared hosting environment. In a commercial hosting environment, you will receive an error message similar to the one below:

Output 1: What You Don't Want to See

Security Exception
Description: The application attempted to perform an operation not allowed by the security policy. To grant this application the required permission please contact your system administrator or change the application's trust level in the configuration file.

Exception Details: System.Security.SecurityException: Request for the permission of type 'System.Security.Permissions.SecurityPermission, mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' failed.

Theoretically, this can be handled by setting up a custom trust level for your site, but that is something that would have to be set up by the hosting service. In my opinion, not likely given the permission that's needed. So, if you are looking for a solution that will work with a commercial hosting service, look at the HTTP module solution defined in the next section.

Creating a Custom HTTP Module

The third way to accomplish URL rewriting is to create a custom HTTP module. Like an HTTP handler, the module has to be configured in the web.config file, as shown below:

<?xml version="1.0" encoding="UTF-8" ?>
<configuration>
   <system.web>
      <httpModules>
         <add type="SunriseXP.Core.SR_HttpModule, SunriseXP.Core" name="SR_HttpModule" />
      </httpModules>
   </system.web>
</configuration>

To create an HTTP module, a class must be created that implements the IHttpModule interface. The Init method of the module will be called when it is first initialized. This method can register the address of an event handler that will be called when certain events occur during the processing of a request. As with the HTTP handler, the event handler will have the opportunity to modify the URL in order to return a different resource than the one originally requested.

One big difference between the HTTP handler solution and the HTTP module solution is how the web page requests that need to be "adjusted" are identified. With an HTTP handler, a pattern was defined in the web.config file that identified which web pages the handler would be called to handle. With the HTTP module solution, the event handler will run for every page, so the event handler will have to contain logic to determine which page requests it needs to affect and which should be left alone to be processed normally.

There are also some other considerations. There are a variety of events that occur during the processing of a page request. A few of the most important ones are the BeginRequest, AuthenticateRequest and AuthorizeRequest events. Whichever event you elect to create an event handler for, the URL rewriting may impact other ASP.NET features such as ASP.NET's login authentication (if you're using it). Additionally, if your delivered web page contains a form, that form is going to post back to the real URL of the web page, not the one originally requested by the user.

With my own designs, I've side-stepped these issues by using the HTTP module solution to display content that's accessible to the public (i.e. - no authentication required), such as articles and blogs. I don't typically include forms on such web pages; or if I do, they're typically forms that ultimately redirect users to another web page. As a result, the solution shown in Listing 2 sets up an event handler for the BeginRequest event, which is the first event that occurs during the processing of a request.

Listing 3: The SR_HttpModule Class


Imports System
Imports System.IO
Imports System.Text.RegularExpressions
Imports System.Web
Imports System.Web.UI
 
Namespace SunriseXP.Core
 
public Class SR_HttpModule : Implements IHttpModule
 
   public ReadOnly Property ModuleName() As [String]
      Get
         Return "SR_HttpModule"
      End Get
   END PROPERTY
 
 
   public SUB Init(ByVal application As HttpApplication) _
         Implements IHttpModule.Init
      AddHandler application.BeginRequest, _
         AddressOf Me.Application_BeginRequest
   END SUB
 
 
   private SUB Application_BeginRequest(ByVal source As Object, _
         ByVal e As EventArgs)
      '--- Create HttpApplication and HttpContext objects to access
      '--- request and response properties.
      DIM application As HttpApplication = CType(source, HttpApplication)
      DIM context As HttpContext = application.Context
 
      '--- Check if the requested URL matches the URL pattern that needs
      '--- to have its URL rewritten to a simpler version
 
      DIM strRequestedURL = application.Request.Path
      DIM strNewURL AS String = "/content.aspx"
 
      DIM objRegex AS Regex = _
         new Regex(".*/content/(\d{1,})\.aspx", RegexOptions.IgnoreCase)
      if (objRegex.IsMatch(strRequestedURL))
         DIM StrFile AS String = _
            Path.GetFileNameWithoutExtension(strRequestedURL)
 
         context.Items("fileName") = strFile.ToLower()
 
         context.RewritePath(strNewURL)
      end if
   END SUB
 
 
   public SUB Dispose() Implements IHttpModule.Dispose
   END SUB
 
End Class
 
End Namespace

In the Application_BeginRequest method, a regular expression is used to match the URL of any web pages that should have their URL rewritten. The pattern used is shown below:

      .*/content/(\d{1,})\.aspx

This regular expression matches any URL that includes a directory named "content" which in turn contains an ASP.NET page with a numeric name before the ".aspx" extension. All other web pages will be ignored. For example, the pattern matches the following URL:

      http://www.keenertech.com/content/1234.aspx

The event handler stores the base name of the web page, minus the path and the file extension, in the Items hash where it can be accessed by the destination web page shown in Listing 2. The way the pattern matching has been set up, this will actually be an article ID that can be used by the destination page to retrieve the appropriate article. Finally, the RewritePath method is used to change the underlying URL to be returned by the request.

Conclusion

URL rewriting can be a powerful technique for organizing the delivery of content to users. It can provide a simpler URL that is easier for users to use and remember, and even to hack as an alternative mechanism for systematically browsing the content of a web site. It can also provide a URL for database-driven content pages that is more effectively indexed by search engines, although search engines have gotten better over the years and this isn't anywhere near as big an issue as it used to be.

References

There are a number of good articles and references available online that are relevant to the topic of URL rewriting. A few of the best ones are listed below:

  • URL Rewriting in ASP.NET
    http://msdn2.microsoft.com/en-us/library/ms972974.aspx
    This article by Scott Mitchell, founder of 4GuysFromRolla.com, is probably the definitive article on the subject of URL rewriting. He also covers topics such as how to handle post-back issue and goes so far as to create a URL rewriting engine that matches using patterns defined in the web.config file. A source code download is available.

  • How to: Create Custom HTTP Modules
    http://msdn2.microsoft.com/en-us/library/ms227673.aspx
    Official Microsoft documentation on how to create custom HTTP modules. Does not otherwise touch on the subject of URL rewriting.

  • Rewrite.NET -- A URL Rewriting Engine for .NET
    http://www.15seconds.com/issue/030522.htm
    A nice article from Robert Chartier on URL rewriting. Like Scott Mitchell, his article features a full URL rewriting engine matching patterns defined in a config file.

  • Regular Expressions in ASP.NET
    http://msdn2.microsoft.com/en-us/library/ms972966.aspx
    A nice introductory article that describes how regular expressions work in ASP.NET. Invaluable if you're going to attempt any serious pattern matching activities.

  • Extending ASP.NET with HttpHandlers and HttpModules
    http://www.devx.com/dotnet/Article/6962/0/page/1
    An article from Bipin Joshi on how to implement HTTP handlers and HTTP modules.

  • URL as UI
    http://www.useit.com/alertbox/990321.html
    A column from Jakob Nielsen, renowned user interface guru, in support of "hackable; URLs, i.e. - URLs that are simple enough that users can manipulate them as another way to navigate through a web site.



Comments

No comments yet. Be the first.



Leave a Comment

Comments are moderated and will not appear on the site until reviewed.

(not displayed)