Wordpress | Tutorials

David Carr

Importing Wordpress posts to another system using the exported XML file

I recently moved away from Wordpress to my own system, one of the challenges I faced was moving my posts from Wordpress into my own system. I found using the exported XML file was the easiest way to achieve this.

Upon first looking at the XML file I thought using SimpleXML would be the easiest approach. My first attempt enabled me to see the details of the post but the post content and excerpt proved to be a little harder to extract.

The reason for this Wordpress uses namespaces with in the XML feed meaning doing a simple lookup would not work the posts needed to be looped through and extracted separately or using a different namespace lookup.

Searching Google I found a brilliant class on Gists by James King https://gist.github.com/Jamesking56/4773838

<?php


/**
* WordPress class - Manages the WordPress XML file and gets all data from that.
*/
class Wordpress
{
    public static $wpXML;

    function __construct($wpXML)
    {
        $this->wpXML = $wpXML;
    }

    public function getPosts()
    {
        $xml = simplexml_load_file($this->wpXML);
        $posts = array();

        foreach($xml->channel->item as $item)
        {
            $categories = array();
            foreach($item->category as $category)
            {
                //echo $category['domain'];
                if($category['nicename'] != "uncategorized" && $category['domain'] == "category")
                {
                    //echo 'Yep';
                    $categories[] = $category['nicename'];
                }
            }

            $content = $item->children('http://purl.org/rss/1.0/modules/content/');
            
            $posts[] = array(
                "title"=>$item->title,
                "content"=>$content->encoded,
                "pubDate"=>$item->pubDate,
                "categories"=>implode(",", $categories),
                "slug"=>str_replace("/", "", str_replace("http://blog.jamesking56.co.uk/", "", $item->guid))
            );
        }

        return $posts;
    }
}

?>

This class is written very well and got me nearly everything I needed apart from the post's excerpt that was missing from the class but luckily adding that was a simple process. I only needed to add another namespace definition and store that to a variable.

$excerpt = $item->children('http://wordpress.org/export/1.2/excerpt/');

The class collects the posts adds them to an array and returns the array, This is fine but I wanted to add the posts to the database inside the class rather then getting an array and looping through it again.

A simple change was needed in the class I passed my database reference ($db) then as the array is created I can pass that to the database in one motion.

$wp = new Wordpress('sitename.xml',$db);
$posts = $wp->getPosts();

My final class looks like this:

class Wordpress
{
    public $wpXML;
    public $db;
 
    function __construct($wpXML,$db)
    {
        $this->wpXML = $wpXML;
        $this->db = $db;
    }

    private function _slug($text){ 

      // replace non letter or digits by -
      $text = preg_replace('~[^\pLd]+~u', '-', $text);

      // trim
      $text = trim($text, '-');

      // transliterate
      $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);

      // lowercase
      $text = strtolower($text);

      // remove unwanted characters
      $text = preg_replace('~[^-w]+~', '', $text);

      if (empty($text))
      {
        return 'n-a';
      }

      return $text;
    }
 
    public function getPosts()
    {
        $xml = simplexml_load_file($this->wpXML);
        $posts = array();
 
        foreach($xml->channel->item as $item)
        {
            $categories = array();
            foreach($item->category as $category)
            {
                //echo $category['domain'];
                if($category['nicename'] != "uncategorized" && $category['domain'] == "category")
                {
                    //echo 'Yep';
                    $categories[] = $category['nicename'];
                }
            }
 
            $content = $item->children('http://purl.org/rss/1.0/modules/content/');
            $excerpt = $item->children('http://wordpress.org/export/1.2/excerpt/');

                

            $post = array(
                "postTitle"=>$item[0]->title,
                "postSlug"=>$this->_slug($item[0]->title),
                "postCont"=>htmlentities($content->encoded),
                "postDesc"=>htmlentities($excerpt->encoded),
                "postDate"=> strftime("%Y-%m-%d %H:%M:%S", strtotime($item[0]->pubDate))
            );

            $this->db->insert("blog",$post);
        }
 
        //return $posts;
    }
}

$wp = new Wordpress('sitename.xml',$db);
$posts = $wp->getPosts();

 

Domains are often purchased from multiple providers, keeping track of where a domain is and its DNS settings can be tricky. Domain Mapper solves this by listing all your domains in one place. View your DNS settings and receive reminders to renew your domains. Try it today.

Support my work by donating with PayPal.

Subscribe to my newsletter

Subscribe and get my books and product announcements.

© 2009 - 2021 DC Blog. All code MIT license. All rights reserved.