2009年6月21日星期日

Net::Douban and XML parse in Perl

If anybody is interested, I am trying to write a Perl module for douban.com. This site has its own API for the user interface, and many people have wrote the client package for this API with languages including Python/Ruby/Java/Php,you can find them at "http://www.douban.com/group/dbapi/" and the API documents at "http://www.douban.com/group/dbapi/".
I have created a repository at github.com, the url is "http://github.com/woosley/Perl-Net-Douban/tree/master". Right now it has a little code pushed to the repository, please feel free to fork and contribute some code.

What I want to say here is I can not find any module suitable for parse Atom and GData format XML at "search.cpan.org". Java/Ruby/Python have Google's GDate package support, but Perl dosen't. Maybe XML::Feed/XML::FeedPP/XMl::TreePP can be used to parse Atom, but the documentation really sucks.

So I spent my day at "search.cpan.org" tring to find some XML module support for XPath. Two choices:
XML::XPath
XML::LibXML

The document for XML::LibXML sucks too. However, it is recommended by most of the Perl hackers because it is more powerful, more efficient and better maintained compared with XML::XPath.

Here is the code I used at first

my $xml = XML::LibXML->new->parse_string( $string);
my @nodeset = $xml->find('//entry');
print Dumper @nodeset;

This dose not work. The story is that you can’t match on the default namespace in XPath. Element names without a prefix always match the null namespace, not the default namespace if it happens to be associated with a URI. So you need another Module:
XML::LibXML::XPathContext;
here is the code:

my $node = XML::LibXML->new->parse_string($string);
my $xml = XML::LibXML::XPathContext->new( $node);
$xml->registerNs('atom','http://www.w3.org/2005/Atom');
my @nodeset = $xml->findnodes('//atom:entry');
print $xml->findvalue('.//atom:id',$nodeset[0]);

没有评论: