Playing with Gnip (Proof of Concept)

By webhat

I mentioned Gnip before, and this afternoon I was browsing the API definition and wanted to see if I could add my own publisher.

First I wanted to see how the messages are polled, so I crafted a wget command to retrieve some example data:

wget -nv --http-user="*username*" --http-passwd="*password*"

https://s.gnipcentral.com/publishers/digg/activity/current.xml


You need to create an account before you can use Gnip, and it’s important to remember to quote the username and password. Your username is your mail address and some shells treat @ as a special character.

Gnip gave me a an output file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><activities><activity guid="http://digg.com/odd_stuff/What_It_s_not_part_of_the_show_PIC" type="submission" uid="andrewcsayer" at="2008-07-10T09:50:31-04:00"/><activity guid="http://digg.com/tech_news/OSS_News_Review_Releases_OSS_Visibility_Index_for_June_2008" type="submission" uid="bhartzer" at="2008-07-10T09:50:30-04:00"/><activity guid="http://digg.com/educational/Karnataka_Diploma_CET_Results_2008_Check_Now" type="submission" uid="mastisearch1" at="2008-07-10T09:50:28-04:00"/>...

As the schema is currently unavailable I had to take the documentation and figure out the requirements myself.

The root element is the <activities> tag. It has one underlying element <activity> which has 3 attributes which can contain any values that the subscriber would like to be able to search on.

UID – The identifier for the owner of the messages, that would be username or mail address or name.
TYPE – The type of message, in the case of twitter this would be called a tweet. (That is is Twitter was supported.) It could contain an action, or it could also contain a subject.
GUID – In a number of the existing publisher this is the URL where more data can be retrieved.
AT – The data at which the message was created.

What was your idea?

I hear you ask, I though it would be handy to receive notifications from mailing lists. Specifically Linux Kernel Mailing List, this is such a high volume mailing list that you really don’t want to get it in your mailbox unless you participate.

So how would it be done?
First we setup a procmail filter which can process the mail:

:0 # process lkml
* ^(To|From|Cc):.*linux-kernel@*
|lkml2gnip.sh

I didn’t actually write lkml2gnip.sh as I don’t really want to become just another storage point for lkml archives, but I can use this entry point to poll the lkml RSS feed at lkml.org or lkml RSS feed at kerneltrap.org.

A simple XSL Transform would be able to convert the messages from the RSS feed, so that’s what I’ll be playing with tonight. Here’s how the elements in RSS map to the Gnip activity attributes:

RSS item sub element Gnip activity attribute
title type
author uid
link guid

Out of interest, you can find a list of publishers here.

Technorati technorati tags: , , , , , , , , , , , ,


Tags: , , , , , , , , , , , ,

One Response to “Playing with Gnip (Proof of Concept)”

  1. Jud Valeski Says:

    hey there. nice breakdown of our schema. we did change the url to it, but i thought we caught all the dead links, can you please post back w/ where you found the broken one (apologies). the schema can be found at https://s.gnipcentral.com/schema/gnip.xsd

Leave a Reply