Advertisements

General Musing

blaze your trail

Playing with Gnip (Proof of Concept)

with one comment

I mentioned Gnip before, and this afternoon I was browsing the API definition and wanted to see if I could add my own publisher.

First I wanted to see how the messages are polled, so I crafted a wget command to retrieve some example data:

wget -nv --http-user="*username*" --http-passwd="*password*"
https://s.gnipcentral.com/publishers/digg/activity/current.xml


You need to create an account before you can use Gnip, and it’s important to remember to quote the username and password. Your username is your mail address and some shells treat @ as a special character.

Gnip gave me a an output file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><activities><activity guid="http://digg.com/odd_stuff/What_It_s_not_part_of_the_show_PIC" type="submission" uid="andrewcsayer" at="2008-07-10T09:50:31-04:00"/><activity guid="http://digg.com/tech_news/OSS_News_Review_Releases_OSS_Visibility_Index_for_June_2008" type="submission" uid="bhartzer" at="2008-07-10T09:50:30-04:00"/><activity guid="http://digg.com/educational/Karnataka_Diploma_CET_Results_2008_Check_Now" type="submission" uid="mastisearch1" at="2008-07-10T09:50:28-04:00"/>...

As the schema is currently unavailable I had to take the documentation and figure out the requirements myself.

The root element is the <activities> tag. It has one underlying element <activity> which has 3 attributes which can contain any values that the subscriber would like to be able to search on.

UID – The identifier for the owner of the messages, that would be username or mail address or name.
TYPE – The type of message, in the case of twitter this would be called a tweet. (That is is Twitter was supported.) It could contain an action, or it could also contain a subject.
GUID – In a number of the existing publisher this is the URL where more data can be retrieved.
AT – The data at which the message was created.

What was your idea?

I hear you ask, I though it would be handy to receive notifications from mailing lists. Specifically Linux Kernel Mailing List, this is such a high volume mailing list that you really don’t want to get it in your mailbox unless you participate.

So how would it be done?
First we setup a procmail filter which can process the mail:

:0 # process lkml
* ^(To|From|Cc):.*[email protected]*
|lkml2gnip.sh

I didn’t actually write lkml2gnip.sh as I don’t really want to become just another storage point for lkml archives, but I can use this entry point to poll the lkml RSS feed at lkml.org or lkml RSS feed at kerneltrap.org.

A simple XSL Transform would be able to convert the messages from the RSS feed, so that’s what I’ll be playing with tonight. Here’s how the elements in RSS map to the Gnip activity attributes:

RSS item sub element Gnip activity attribute
title type
author uid
link guid

Out of interest, you can find a list of publishers here.

Technorati technorati tags: , , , , , , , , , , , ,

Advertisements

Written by Daniël W. Crompton (webhat)

July 10, 2008 at 7:55 pm

One Response

Subscribe to comments with RSS.

  1. hey there. nice breakdown of our schema. we did change the url to it, but i thought we caught all the dead links, can you please post back w/ where you found the broken one (apologies). the schema can be found at https://s.gnipcentral.com/schema/gnip.xsd

    Jud Valeski

    July 11, 2008 at 3:50 am


Please Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: