Friday, March 11, 2011

Grabbing only Pictures out of an XML feed

Pulling Images from feeds with Google AJAX Feed API and Regular Expressions



Recently, I have been building a super aggregator and search engine website using various google APIs. The search engine is using the Google search API and most of the rss and xml aggregation is handled using the Google AJAX Feed API. But, in order to retrieve only the images out of the xml feed, I had to make modifications to the standard Google Feed API.

The xml feed had some images and some other links and text, but I just wanted to grab the linkable pictures so I had to write an extra function and use regular expressions to get rid of the content i did not want. I assume that this piece of code could be useful to other who want to just pull images out of an xml feed that contains more than just pictures.

Here is the JavaScript excerpt:

[sourcecode language="JavaScript"]
<script type="text/javascript">
google.load("feeds", "1");

function replaceit(matches){
var pattern = new RegExp("<img([^>]+)(\s*[^\/])>", "g");
return matches.match(pattern);
}

function initialize() {
var feed = new google.feeds.Feed("http://feeds.feedburner.com/TechCrunch?format=xml");
feed.setNumEntries(20);

feed.load(function(result) {
if (!result.error) {
var container = document.getElementById("feed");
for (var i = 0; i < result.feed.entries.length; i++) {
var entry = result.feed.entries[i];
var div = document.createElement("div");
var wow = replaceit(entry.content);
div.innerHTML = wow;
container.appendChild(div);
}
}
});
}
google.setOnLoadCallback(initialize);
</script>
[/sourcecode]


Here is how the code works:

You define the feed on line 10, and on line 11 you set the number of entries you want to pull from the XML feed with set setNumEntries() function. If the feed loads successfully line 16 loops through the 20 items and pulls the content out of the feed. The content contains some text and links that I need to get rid of so line 19 runs the content through a function I set up on line 4.

The function uses regular expressions to retrieve the img tags and only return the images. Line 21 injects the results into the div specified on line 15 and it uses appendChild to get it done. Hopefully, this is useful to someone!

No comments:

Post a Comment