Tapirtype Blog: Index

« Cruftless URLs without breaking links | Main | A footnote solution: fixing MT-Textile »

The son of cruftless links

The son of cruftless links

Ok, so I went and did the whole “cruftless links” thing and no sooner had I finished writing the post but I found about a million little reasons why what I did and had just described didn’t quite work.

I’m still not quite sure what the deal with some of the behavior I’m seeing is, but I’ve got things to a somewhat stable point now. This is what I’ve settled on:

Why the old way didn’t work

There were basically three problems with how I wanted to deal with making my links cruftless. First, and biggest, was the problem that in spite of the fact that the permalink tag was generating the correct link to “/year/month/base_name/” trackbacks were going to “/year/month/base_name/index.php” meaning that if I ever changed to another file type for my indexes, I’d have to work out a redirect or links would break defeating the whole point of archiving things as /base_name/index.extension. I had figured that since Movable Type was now offering this as a publishing option, they would have fixed their trackbacks as well. Even their “entry more” links correctly go to “/base_name/#more” rather than to “/base_name/index.php#more” which makes the fact that the trackbacks go to the wrong place really puzzling.

The second, related problem, is that for some reason the program that auto detects trackback URIs fails to find them now. My guess is that the program sees the link as a link to a directory and fails to open the index inside, but I’m really just taking a wild guess here. I don’t mind this too much. It only affects internal trackbacks for me and I have no idea what other people’s software will do.

The third problem is that the footnotes that textile writes make the link to /base_name/index.php#fn1 meaning that any click on the footnote will expose the “.php” extension. And this is a further problem because it has gotten even more fragged by my fix for problem #1.

Fixing the trackbacks… sort of

So I figure that the only real, serious, problem is the fact that Movable Type sends out the crufty trackback links. There are a couple of ways I can go about fixing this. First of all one of the articles that I pointed to before talked about modifying the application code to fix the trackback links that are sent out. I’m really hesitant to do this, though. I want to be able to upgrade to newer versions with little headache, I really don’t know perl very well, and I’m not interested in figuring out the logic of the Movable Type application code well enough to be confident that I know what I’m doing, especially since the instructions that I just pointed to are old. Second I could just wait for Six Apart to release a future version that will hopefully fix this problem and in the mean time redirect people to the right place.

Redirecting to the right place

So here’s a better solution, just redirect any incoming links. That’s essentially what Mark Pilgrim did with his extension-less archives. He used a RedirectMatch command in his .htaccess file to send any requests for a file ending in .html to the same path without the .html ending.

That’s great for him, but I can’t use RedirectMatch without causing an infinite loop with my solution. That’s because the way that RedirectMatch works, it is triggered whenever you request the file that matches the expression you present, whether or not the request itself matches. That means that while your regular expression might not match “path/to/file/” it will still match when you match for “path/to/file/index.php” since that is the actual file that is being requested. As far as I know there is no way to stop it from matching the index file when you requested the directory. (Incidentally this is what it should to most times because you want to catch the moved file no matter how you requested it).

My solution… and the strangeness thereof

Update 11/11/06: Now that I’ve moved over to Textdrive the problem I talk about below is fixed. I can now use mod_rewrite as it was intended. I’m pretty sure the problem had something to do with how my old server was handling mod_rewrite in a virtual hosting / subdomain environment.

So if I’m going to get this to work I’m going to have to use mod_rewrite instead which has more power. mod_rewrite usually just serves up the redirected page without telling the browser to load a different location, but you can specify a redirect behavior. This is what I should have been able to do, but because of the peculiarities of my hosting environment, and because I’m blogging form a sub-domain it would always redirect to the main (www.tapirtype.com) website instead of this one. There might have been a way for me to stop that, but I couldn’t figure it out. However I stumbled onto some strange behavior that allows me to achieve the same thing. I’m nervous about it because I don’t entirely understand what is going on, so I don’t know whether I’m exploiting a bug or unknowingly triggering the correct behavior.

In short I found that when I used the following redirect code I got the response I wanted:

RewriteRule ^(.*)/index$ $1
RewriteRule ^(.*)/index\.(.*)$ $1

The first rule matches index with no extension and the second rule matches index.anything. The strange thing is that I found that if I left off any trailing slash from the $1 (here referring to the first parentheses group which will be the path up to before the index file: the directory I want to direct the viewer to) it would trigger a redirect going to the correct website exactly as I wanted—even though I didn’t specify that it should redirect with a [R] on the end.

So as it stands this will take care of any links that get sent directly to the indexes, freeing me to use any file ending I want in the future. I solve the problem of people going to the old “year/month/base_name.extension” location the same way I did before, though now I’ve moved the directive to the top level .htaccess file because as long as I’ve got rewrite rules there I might as well move one more in and it seems that the lower level directory .htaccess files override the higher level ones. I didn’t want to have to remember to append any rules onto those files as well, so I just made this rule: RewriteRule ^(2006/[\d][\d]/)([^/]+)\.(.*)$ $1$2 [L] to match only those files that existed before the change (ok, well, those as well as the rest of the entries from this year which will be over soon anyway).

The niggling problem of the footnotes

The remaining problem with this is that with this redirection the footnotes not only go to the “index.php#fn” directly but they get further hosed because that pattern matches my RewriteRule punching you back to the top of the file. In theory I should be able to write the rule so that it passes the “#anything” on to the directory, but for some reason I couldn’t get it to work even though I wrote an expression that I knew should match. Clearly I was doing something wrong, but I had better things to do than beat my head against it when I’m not entirely happy with the footnote implementation to begin with… But it is unfortunate because I do like footnotes. I’ll have to resurrect them one way or another.

Oh, and strangely a “index.php?something” request gets redirected the right way to “/?something” even though I don’t know why it should. Something strange is definitely going on and it may come back to bite me in the ass later, but for now it works.

You are visiting Tapirtype Blog. Unless otherwise noted, all content is © 2006-2008 by Sasha Kopf and Michael Boyle, some rights reserved. Site design by Michael Boyle modified from the standard Movable Type templates. I've made an attempt to generate standards compliant content which should look best in Safari or, otherwise, Firefox. Use of Internet Explorer may be harmful to your sanity and I've made little attempt to support it.

If you like you can subscribe to Tapirtype Blog's feed. That way you can be the first to know when more things burble from our brains.

This page is published using Movable Type 4.1