Archive for 29th March 2010

Upgrading My Blog From Moveable Type 2.65 to WordPress 2.9.2 Maintaining PageRank

My blog has been running since 2003 using Movable Type 2.65 – with Lance, Zach, and Steve all suggesting I upgrade, this weekend turned out the be the week I decided to give it a try. I also am starting to use WordPress in my courses using IMS Basic LTI – so I figured I might as well find my way around it. My site has decent PageRank since I have been doing this for seven years now.

I had several goals in the conversion: (a) maintain my Google PageRank on the pages, (b) keep all my old posts and support all the old urls, and (c) keep the page identifiers the same in my WordPress database.

I am not much of an expert on Google PageRank – but I did watch this excellent talk from Google I/O 2008 by Maile Ohye:

Google I/O 2008 – Search Friendly Development

Maile repeatedly talks about the need for permanent redirects when web sites are changed – so I took that to heart. I recommend the video to *anyone* who is interested in maintaining or increasing PageRank legitimately.

I found a few helpful Blog Posts – but I waited so long to convert that all the instructions were pretty-much out-of-date. This blog post from Scott Yang was my inspiration – but I did have to adapt things to a newer version of WordPress:

http://scott.yang.id.au/2004/06/wordpress-migration-notes/

So the first thing to do is export from Moveable Type and retain the post ID’s. In this I followed Scott’s directions slightly adapted to my version of Movable Type. This required editing the file ./lib/MT/App/CMS.pm adding the ‘POSTID’ line at line 2970 of my file:


DATE: <$MTEntryDate format="%m/%d/%Y %I:%M:%S %p"$>
POSTID: <$MTEntryID$>
-----
BODY:

Then, also inspired by Scott’s post I went into the Movable Type’s user interface to export all entries, comments and trackbacks into a plain text file.

My old blog was installed at csev-blog so I initially installed WordPress at csev_blog (with an underscore). I later renamed it to csev-blog below.

Then I made some changes to my WordPress installation. I edited the file ./wp-admin/import/mt.php at line 418:

                                }
                        } else if ( 0 === strpos($line, "POSTID:") ) {
                                $postid = trim( substr($line, strlen("POSTID:")) );
                                $post->import_id = $postid;
                        } else if ( 0 === strpos($line, "EMAIL:") ) {

It turns out that WordPress now understands the notion of import_id – so there was no need to change the SQL (per Scott’s post) and the insert is no longer in ./wp-admin/import/mt.php anyways. Since WordPress already knows about import_id no further changes were necessary.

Then I copied the exported text file into ./wp-content/mt-export.txt and used the WordPress user interface to do the import without the upload. It would only import about 250 entries before hitting a run-time limit. I checked MySql to make sure the ID field in the wp_posts table were really being taken from the MT import.

I then edited the file ./wp-content/mt-export.txt to delete the first 249 posts and re-ran the import. The WordPress import is smart enough to not double import – so I always kept the last successful import to be sure I got them all. By deleting the first “249” posts and re-running the import over and over – after three imports, I had all 638 posts imported.

The next task was to edit my .htaccess to make my of URLs work. I needed to fix individual posts like 000749.html and then monthly digests like 2009_12.html and map the to my new permalink structure. I used the permalink structure that was 2010/03/blog-title-post to make my PageRank be as cool as it could be.

Here is my .htaccess file.


# BEGIN WordPress

RewriteEngine On
RewriteBase /csev-blog/
RewriteRule ^([0-9]{4})_([0-9]{2}).html$ /csev-blog/$1/$2/ [R=permanent,L]
RewriteRule ^([0-9]{6}).html$ /csev-blog/mt-singlepost.php?p=$1 [L]
RewriteRule index.rdf /csev-blog/feed/rdf/ [R=permanent,L]
RewriteRule index.xml /csev-blog/feed/ [R=permanent,L]
RewriteRule atom.xml /csev-blog/feed/atom/ [R=permanent,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /csev-blog/index.php [L]

# END WordPress

The simplest rule was for the monthly digest files 2009_12.html which could be directly redirected to the new permalink structure of /2009/12/ – I wanted the redirects to be permanent and I wanted there to be one redirect to transfer PageRank as quickly and cleanly as possible – so make sure to have the trailing slash.

The three lines for the RSS feeds were similarly simple and done as permanent redirects – I wrote a bit of code that I never used that was designed to fake the RSS feeds forever called mt-feed.php – I almost got it working – but it was a bit flaky in some readers and I just decided to fall back to the redirect. I include the code for mt-feed.php at the end of the post – make sure to test everything carefully before using it. I did not think that Google cared too much about the RSS feeds w.r.t. PageRank so I took the easy way out with the redirects.

The trickiest bit was to map the individual posts to the new location (000736.html). I could have taken the easy way out and made a rewrite rule to send them all to index.php?p=000736, similar to how Scott did it – but since my WordPress permalink structure was /year/month/title this would be two redirects. The first would be from 000736.html to index.php?p=000736 and the second would be from index.php?p=000736 to /2008/10/some-title and I wanted Google to have every chance to transfer my PageRank – so I wanted one redirect and I wanted it to be a permanent redirect.

So my rewrite rule transformed the individual posts to mt-singlepost.php?p=000736 and I wrote the following code.

require('wp-blog-header.php');

$posts = query_posts('p='.$_REQUEST['p']);
if ( have_posts() ) {
    while ( have_posts() ) {
        the_post();
        header("HTTP/1.1 301 Moved permanently");
        header('Location: '.get_permalink());
        exit;
    }
}
header("HTTP/1.1 404 Not Found");

Again an adaptation to Scott’s pattern but using more modern calls to WordPress 2.9.2. This gave me my single, permanent (301) redirect so I can transfer PageRank efficiently.

By letting both blogs go simultaneously with the original Movable Type blog on csev-blog and the new WordPress blog on csev_blog, I would test lots of URLs and be quite patient going back and forth. But once things worked – it was time to rename the folder on the server.

Important – make a copy of your .htaccess file before taking this step. Because changing the folder in WordPress will rewrite the .htaccess file wiping out all your precious changes. SAVE YOUR .htaccess FILE!!!!!

Go into the WordPress admin interface and under settings rename the blog’s url from csev_blog to csev-blog. Then rename the folders on the server. Then immediately edit the .htaccess file putting back in your clever redirects – making sure to change csev_ to csev- in the rules.

Test all the old URLs – there should be one redirect. Using FireBug you should be able to see the redirects in action and really verify things work. I found Chrome was the best way to test the RSS redirects – both Safari and FireFox get way too tricky when doing RSS feeds to even see what happened – thankfully my version of Chrome was clueless about RSS feeds so I could see what was really happening and verify proper operation. I am sure some new version of Chrome will get “smarter” and make it impossible to figure this out. Then I will write some Python code to do a urllib GET.

So things should now be OK.

As promised – here is the code for the RSS hack that I never deployed. Again this never worked perfectly for me – so test this a lot before you trust it. I called this file mt-feed.php:

require('wp-blog-header.php');
$thetype = $_REQUEST['type'];
$rssurl = get_bloginfo('rss_url');
if ( $thetype == 'rss2' || $thetype == 'atom' || $thetype == 'rdf' ) {
    $rssurl = get_bloginfo($thetype.'_url');
}
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $rssurl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
$output = curl_exec($ch); 
$content_type = curl_getinfo( $ch, CURLINFO_CONTENT_TYPE );
curl_close($ch);     
header('Content-type: '.$content_type);
echo($output);

I hope you find this helpful. I love WordPress and the fact that my new blog can accept comments! I have moved forward in time nearly seven years in terms of blog software and it feels pretty good.

I want to thank Scott Yang for such a good blog post that showed me the way forward. With his patterns – all I needed to do was map things to the newer version of WordPress.