| Does capitalization matter in web
page URLs? I recently did some research on this
topic when I switched most of my websites from
Windows-based hosting to Linux-based. I figured
this would be a good time to impose a little more
order and uniformity on the way I named my URLs.
I was just beginning to embark on a massive binge
of URL "case standardization", when it
struck me that it might be a good idea to see how
this might effect search engines, bookmarks, and
other links to my pages. Well, it seems if your
page is hosted in Windows, links will get to your
pages regardless of capitalization (or lack of
it). So if the search engine indexed your page
when it was www.example.com/page01.htm,
it will still find it even if you renamed it www.example.com/PAGE01.htm
(but read on, as there ARE ways that
capitalization might effect you).
In a Linux or Unix-based environment, things
are a little different. The good news is that the
base URL (www.example.com) will
resolve correctly regardless of capitalization.
The (potentially) bad news is that the other
pages will not. So if your site is hosted in a
Linux/Unix environment and you rename your page www.example.com/PopularPage.htm
to www.example.com/popularpage.htm,
you could suddenly find yourself facing a drop in
traffic as people clicking on the search engine
link get a 404 error telling them that the page
can not be found! Of course, search engines do
crawl through the web and update their indices
from time to time, so your page would probably
get corrected in most search engines ...
eventually. In the mean time, however, you lose
traffic, potential revenue, and possibly links
from other sites, all because people just gave up
on your page when they got a 404 error. (And of
course any existing user bookmarks or links from
sites other than search engines will have the
same problem as the search engines and might take
longer ... or forever .. to get fixed).
So what's a webmaster to do? Well, ideally
decide on a "case convention" you can
stick with at the start. Think long and hard
before changing the case of a page, particularly
if it draws a lot of visitors who come directly
to it from search engines/links/bookmarks. If you
feel you must change the case of these pages, you
might want to make sure that you have a custom
404 error page with a link to enable visitors to
find your main index page and perhaps links to
several of your other important pages as well.
For a really important page, you might want to
create another page with the URL in the original
case, giving your visitors a link to the
"new" page, perhaps automatically
forwarding them there. (www.example.com/PAGE01.htm
forwards to www.example.com/page01.htm
and so on).
"But I'm hosted in a Windows
environment," you say. "How does this
affect me?" Ah, not at all ... as long as
you stay with Windows hosting ... FOREVER. A day
might come when you decide to move to Unix or
Linux in order to be able to utilize features
unavailable in a Windows hosting environment
(which was a primary factor in my move - that,
and some of the unique features available at FutureQuest ). Or your current
hosting provider could make the decision for you
at some point by dropping Windows hosting and
forcing you to migrate to Unix/Linux or find a
new host. When and if that day comes, if you've
changed the case of pages in the past, you could
find yourself in a quandary. You might have one
set of search engines or links pointing to www.example/Page01.htm
and another to www.example/page01.htm.
It's a problem that is best avoided be careful
planning in the first place. If you've
intentionally (or unintentionally) changed the
case of a page name though, don't despair. If you
stick with the new name long enough, most search
engines will eventually update to the current
version. That way, you'll be ready when and if
the Linux/Unix move ever comes.
As long as you are consistent, it probably
does not matter much which particular convention
you adopt for naming your page URLs - There are
several several schemes which have some appeal.
You could go with all lower case, the advantages
being that it is easy to be consistent and easy
for someone who types your page URL directly into
their browser to get it right (although in most
cases users will probably either bookmark a page
or just navigate to it from the main index of
your site). Lower case does have the disadvantage
of being harder to read on long URLs
(www.example.com/thisishardtoread.htm as opposed
to www.example.com/ThisIsHardToRead.htm), so you
might want to consider mixed case for that
reason. I'd avoid using all upper case except
where it is standing in for a short abbreviation
(www.example.com/URL/Information.htm).
There's really no great problem with mixing
the conventions either, as long as you don't
change a URL once it is in place. (Aesthetically
though, it irritates me if I find that I have not
been consistent with pages having similar names
(Page01.htm, Page02.htm,page03.htm).
Incidentally, inconsistency can sometimes make
managing your site harder too, as the pages
sometimes don't sort alphabetically in the order
you would expect.
One final point - In most of my examples, I've
used the case of a page for an example,
but the principles would apply equally well to folders
on your web site.(www.example.com/folder/pg01.htm
vs www.example.com/Folder/pg01.htm,
etc.). Indeed, an unwise renaming of a folder
could be potentially far worse, as it would apply
to all of the folders and pages below it, and
that could be a problem with a capital P!
|