|
URL rewriting is a technique in which an incoming request to a web server
is intercepted and a different resource is returned than the one requested. It's usually
done for one of two reasons:
Resources Have Moved: If web pages have been moved, but the
old URLs are likely to be still used, either because users have bookmarked pages or
because the pages have been indexed by search engines, then URL rewriting will allow
the old URLs to still work.
Simplifying URLs: URL rewriting can create simplified URLs
that are easier for users to use and remember, even though the underlying URLs might
be more complex. Additionally, search engines used to skip indexing for web pages
that accepted URL parameters, so URL rewriting allowed the use of URLs that
incorporated key parameters into the name of the requested resource.
The most common use of URL rewriting probably appears in content management
systems, as typified by the two URLs below:
The top URL represents a web page that shows the content of articles, with
the specific article shown being determined by a URL parameter. The second URL shows the
same article, but there is no URL parameter.
In fact, the web page "24.aspx" doesn't even exist. The request for the web
page is intercepted and translated into the first URL, so it looks to the users as if the
requested web page actually exists. In the old days, when search engines didn't index pages
that accepted URL parameters, this technique ensured that the article could be indexed.
Today, it just provides a simpler user experience. In fact, it can provide an alternate
navigation mechanism for a web site, with users hacking the URL to view different articles.
So, how do you do URL rewriting in ASP.NET? There are three ways to do it,
each of which has different strengths and weaknesses. These three ways are summarized below:
URL Mapping: ASP.NET provides functionality
within the web.config file that allows one resource to be returned in place of a
requested one. It's not overly useful because the URL mapping mechanism in ASP.NET 2.0
doesn't support wildcards.
Custom HTTP Handler: Provide an HTTP Handler that will
intercept the requested resource and substitute the new one. This is pretty easy, but
may not work for web sites hosted commercially, since the technique requires security
permissions.
Custom HTTP Module: Provide an HTTP Module and
have the module perform the resource substitution. This technique will
work for hosted web sites, but is a little more complex than the HTTP Handler
method. It can also have some ramifications for other ASP.NET features.
Each of these techniques is described in the following sections.
URL Mapping
Beginning with ASP.NET, Microsoft added a capability for URL mapping to the
web.config file for ASP.NET web sites. An example configuration is shown below:
<system.web>
<urlMappings enabled="true">
<add url="~/widgets.aspx" mappedUrl="~/products/w/widgets.aspx" />
</urlMappings>
</system.web>
This capability permits a URL to be mapped to another URL, which works fine
if you have a finite list of URLs to be mapped. Since this technique doesn't support wildcards,
it doesn't work at all for open-ended mappings, such as would be needed for to map URLs like
"24.aspx" to a web page that displays articles. Every time a new article was created, you'd
have to add a new entry to the file.
Improvements to this mapping capability are planned to coincide with the advent
of Windows Vista (formerly known as Longhorn), the next version of Windows. It's likely that
ASP.NET 3.0 will include support for wildcards. If so, that may make the need for URL
rewriting via HTTP handlers and HTTP modules go away.
Until then, the current edition of URL mapping is almost useless. Onwards
to the other techniques!
Understanding How Requests Are Processed
To understand custom HTTP handlers, a little background on how ASP.NET
processes web pages is needed. A request for a resource first goes to the IIS web server.
Based on the extension, the request is forwarded to the proper ISAPI (Internet Server API)
extension, which is responsible for returning the data necessary to answer the request.
For pages ending with ".aspx", IIS is configured to dispatch the request to
aspnet_isapi.dll.
Next, ASP.NET initializes a series of HTTP modules. The request is sent
to all of the configured HTTP modules, each of which can manipulate the request as needed.
Finally, the request will be sent to an HTTP handler or HTTP handler factory, ultimately
causing the resource to be rendered and sent back to the requester.
This sequence of events provides two points where the request can be
intercepted and a new resource substituted for the requested one: at the HTTP module level
and the HTTP handler level.
Creating a Custom HTTP Handler
After initializing the HTTP modules, ASP.NET looks for the appropriate
HTTP handler or HTTP handler factory to invoke to handle the request. To accomplish URL
rewriting, simply create a custom HTTP handler factory by creating a class that implements
the IHttpHandlerFactory interface. This interface requires that the class provide
implementations for the GetHandler and ReleaseHandler methods.
The GetHandler method receives a number of parameters that are
useful with regard to URL rewriting, including an HttpContext object that provides
access to request-related information. In addition, the method also receives the requested
URL, the physical path of the requested file, etc.
Ultimately, the GetHandler method can make changes to accomplish URL
rewriting, as shown in Listing 1. Assuming an incoming URL with an emdedded article ID,
such as "24.aspx", the URL is parsed to get the article ID. The article ID is stored in the
Items hash, which is a little-known mechanism for storing data that can be accessed
throughout the duration of a request. The ReWritePath method of the HttpContext
object is used to change the resource that will ultimately be returned.
Finally, the method has to return an instance of the IHttpHandler
class. To do this, it calls the GetCompiledPageInstance method with the new URL
and new path; this method conveniently returns an IHttpHandler instance for the
desired resource.
Listing 1: The SR_HandlerFactory Class
Imports System
Imports System.IO
Imports System.Web
Imports System.Web.UI
Namespace SunriseXP.Core
public Class SR_HandlerFactory : Implements IHttpHandlerFactory
overridable overloads FUNCTION GetHandler(context AS HttpContext, _
requestType AS String, url AS String, pathTranslated AS String) _
AS IHttpHandler IMPLEMENTS IhttpHandlerFactory.GetHandler
context.Items("fileName") = Path.GetFileNameWithoutExtension(url).ToLower()
DIM strNewURL AS String = "/content.aspx"
DIM strNewPath AS String = context.Server.MapPath("~/content.aspx")
context.RewritePath(strNewURL)
return PageParser.GetCompiledPageInstance(strNewURL, strNewPath, context)
END FUNCTION
overridable overloads SUB ReleaseHandler(handler AS IHttpHandler) _
IMPLEMENTS IhttpHandlerFactory.ReleaseHandler
'--- Does nothing
END SUB
End Class
End Namespace
|
Of course, it's not enough to have a class that implements the
IHttpHandlerFactory interface. You also have to tell ASP.NET to use the
class. This is done by adding some configuration lines to the web.config file, as
shown below:
<?xml version="1.0" encoding="UTF-8" ?>
<configuration>
<system.web>
<httpHandlers>
<add verb="*"
path="*/content/*.aspx"
type="SunriseXP.Core.SR_PageHandlerFactory, SunriseXP.Core" />
</httpHandlers>
</system.web>
</configuration>
To configure a new HTTP handler, add an <httpHandlers>
section to the web.config file. Within that set of tags, define the URL pattern that
identifies which web pages the handler should be used for and which handler should be
used. In this case, the "verb" attribute signifies whether the handler will handle GET,
POST or other types of requests for the resources. The "*" value indicates that the
handler will always be called regardless of the type of request.
The "path" attribute defines the pattern that should be matched, i.e.
- every web page with a ".aspx" extension found in a folder named "content". Note that
the handler could be configured to match any extension, but the handler will only be
called if the IIS web server is configured to pass requests for the extension to
ASP.NET.
Finally, the "type" attribute defines the class and the location
of the class for the handler to be called for matching requests.
Now that the handler factory class had been created, and the configuration
set up so that the class will be called, let's look at the sample destination web page
shown in Listing 2. This page accesses the Items hash to retrieve the
article ID that was stored by the GetHandler method. In real life, the web page
would validate the article ID and then display either the requested article or an
appropriate error message.
Listing 2: A Sample Content Web Page
<%@ Page Language="vb" %>
<script runat="server" language="vb">
SUB Page_Init()
lblMessage.Text = HttpContext.Current.Items("fileName")
END SUB
</script>
<html>
<head>
<title>Test Content Page</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="/include/keenertech.css">
</head>
<body>
<p class=ws_text>The message was: <asp:Label runat="server" ID="lblMessage" />
</body>
</html>
|
This is a simple and useful mechanism for implementing URL rewriting in
ASP.NET. Unfortunately, it doesn't work in all environments. Specifically, if your web site
is hosted by a commercial hosting service, I can assure you that this solution will not
work. The solution requires a security permission that is not likely to be granted in
a shared hosting environment. In a commercial hosting environment, you will receive an
error message similar to the one below:
Output 1: What You Don't Want to See
|
Security Exception
Description: The application attempted to perform an operation not allowed by the security policy.
To grant this application the required permission please contact your system administrator or
change the application's trust level in the configuration file.
Exception Details: System.Security.SecurityException: Request for the permission
of type 'System.Security.Permissions.SecurityPermission, mscorlib,
Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' failed.
|
Theoretically, this can be handled by setting up a custom trust level for your
site, but that is something that would have to be set up by the hosting service. In my opinion,
not likely given the permission that's needed. So, if you are looking for a solution that will
work with a commercial hosting service, look at the HTTP module solution defined in the next
section.
Creating a Custom HTTP Module
The third way to accomplish URL rewriting is to create a custom HTTP module.
Like an HTTP handler, the module has to be configured in the web.config file, as shown below:
<?xml version="1.0" encoding="UTF-8" ?>
<configuration>
<system.web>
<httpModules>
<add type="SunriseXP.Core.SR_HttpModule, SunriseXP.Core" name="SR_HttpModule" />
</httpModules>
</system.web>
</configuration>
To create an HTTP module, a class must be created that implements the
IHttpModule interface. The Init method of the module will be called when it is
first initialized. This method can register the address of an event handler that will be called
when certain events occur during the processing of a request. As with the HTTP handler, the
event handler will have the opportunity to modify the URL in order to return a different
resource than the one originally requested.
One big difference between the HTTP handler solution and the HTTP module
solution is how the web page requests that need to be "adjusted" are identified. With an HTTP
handler, a pattern was defined in the web.config file that identified which web pages
the handler would be called to handle. With the HTTP module solution, the event handler will run
for every page, so the event handler will have to contain logic to determine which page requests
it needs to affect and which should be left alone to be processed normally.
There are also some other considerations. There are a variety of events that
occur during the processing of a page request. A few of the most important ones are the
BeginRequest, AuthenticateRequest and AuthorizeRequest events. Whichever event
you elect to create an event handler for, the URL rewriting may impact other ASP.NET
features such as ASP.NET's login authentication (if you're using it). Additionally, if
your delivered web page contains a form, that form is going to post back to the real
URL of the web page, not the one originally requested by the user.
With my own designs, I've side-stepped these issues by using the HTTP module
solution to display content that's accessible to the public (i.e. - no authentication
required), such as articles and blogs. I don't typically include forms on such web pages; or
if I do, they're typically forms that ultimately redirect users to another web page. As a result,
the solution shown in Listing 2 sets up an event handler for the BeginRequest
event, which is the first event that occurs during the processing of a request.
Listing 3: The SR_HttpModule Class
Imports System
Imports System.IO
Imports System.Text.RegularExpressions
Imports System.Web
Imports System.Web.UI
Namespace SunriseXP.Core
public Class SR_HttpModule : Implements IHttpModule
public ReadOnly Property ModuleName() As [String]
Get
Return "SR_HttpModule"
End Get
END PROPERTY
public SUB Init(ByVal application As HttpApplication) _
Implements IHttpModule.Init
AddHandler application.BeginRequest, _
AddressOf Me.Application_BeginRequest
END SUB
private SUB Application_BeginRequest(ByVal source As Object, _
ByVal e As EventArgs)
'--- Create HttpApplication and HttpContext objects to access
'--- request and response properties.
DIM application As HttpApplication = CType(source, HttpApplication)
DIM context As HttpContext = application.Context
'--- Check if the requested URL matches the URL pattern that needs
'--- to have its URL rewritten to a simpler version
DIM strRequestedURL = application.Request.Path
DIM strNewURL AS String = "/content.aspx"
DIM objRegex AS Regex = _
new Regex(".*/content/(\d{1,})\.aspx", RegexOptions.IgnoreCase)
if (objRegex.IsMatch(strRequestedURL))
DIM StrFile AS String = _
Path.GetFileNameWithoutExtension(strRequestedURL)
context.Items("fileName") = strFile.ToLower()
context.RewritePath(strNewURL)
end if
END SUB
public SUB Dispose() Implements IHttpModule.Dispose
END SUB
End Class
End Namespace
|
In the Application_BeginRequest method, a regular expression
is used to match the URL of any web pages that should have their URL rewritten. The
pattern used is shown below:
.*/content/(\d{1,})\.aspx
This regular expression matches any URL that includes a directory named
"content" which in turn contains an ASP.NET page with a numeric name before the ".aspx"
extension. All other web pages will be ignored.For example, the pattern matches the
following URL:
http://www.keenertech.com/content/1234.aspx
The event handler stores the base name of the web page, minus the path
and the file extension, in the Items hash where it can be accessed by the destination
web page shown in Listing 2. The way the pattern matching has been set up, this
will actually be an article ID that can be used by the destination page to retrieve the
appropriate article. Finally, the RewritePath method is used to change the underlying
URL to be returned by the request.
Conclusion
URL rewriting can be a powerful technique for organizing the delivery of
content to users. It can provide a simpler URL that is easier for users to use and remember,
and even to hack as an alternative mechanism for systematically browsing the content of a
web site. It can also provide a URL for database-driven content pages that is more effectively
indexed by search engines, although search engines have gotten better over the years and
this isn't anywhere near as big an issue as it used to be.
References
There are a number of good articles and references available online that are
relevant to the topic of URL rewriting. A few of the best ones are listed below:
URL Rewriting in ASP.NET
http://msdn2.microsoft.com/en-us/library/ms972974.aspx
This article by Scott Mitchell, founder of
4GuysFromRolla.com, is probably the
definitive article on the subject of URL rewriting. He also covers topics such as how
to handle post-back issue and goes so far as to create a URL rewriting engine that matches
using patterns defined in the web.config file. A source code download is available.
How to: Create Custom HTTP Modules
http://msdn2.microsoft.com/en-us/library/ms227673.aspx
Official Microsoft documentation on how to create custom HTTP modules. Does not otherwise
touch on the subject of URL rewriting.
Rewrite.NET -- A URL Rewriting Engine for .NET
http://www.15seconds.com/issue/030522.htm
A nice article from Robert Chartier on URL rewriting. Like Scott Mitchell, his article
features a full URL rewriting engine matching patterns defined in a config file.
Regular Expressions in ASP.NET
http://msdn2.microsoft.com/en-us/library/ms972966.aspx
A nice introductory article that describes how regular expressions work in ASP.NET.
Invaluable if you're going to attempt any serious pattern matching activities.
Extending ASP.NET with HttpHandlers and HttpModules
http://www.devx.com/dotnet/Article/6962/0/page/1
An article from Bipin Joshi on how to implement HTTP handlers and HTTP modules.
URL as UI
http://www.useit.com/alertbox/990321.html
A column from Jakob Nielsen, renowned user interface guru, in support of "hackable; URLs,
i.e. - URLs that are simple enough that users can manipulate them as another way to
navigate through a web site.
|