Blue Screen Of Duds

Where the alter ego of codelust plays

Clustered river of news

with one comment

RSS readers have over time become pretty fully-featured software on their own. Most now provide the standard set of features: OPML import/export, categories, river of news and search irrespective of their avatar — online or offline — and I have pretty much grown used to depending on my reader of choice Google Reader to satisfy the need to read my feeds.

That said, there is one feature I’d really love to have in my RSS reader – to have clustering on feeds as an additional way to categorise data, other than the current methods of categories and tags. Think of it as a cross between your RSS reader and Google News/Techmeme. Would it not be nice to have your little personal Google News or Techmeme from the sources that you have picked than be led by what Gabe or the kind folks at Google News may have seeded their websites with?

There are, though, a couple of problems that could make this impossible:

Processing: Any algorithm that finds similarities in text is computationally intensive even in cases where the data set is limited. Scaling is often possible in such circumstances when the size of the data set is reasonably fixed and with the variance that comes in the size of different RSS subscription lists, it would be a royal pain to find a right algorithm that will scale effectively and efficiently.

Entropy: Traditional similarity match approaches work best when they cover a similar domain so that an apple would mean apple the fruit rather than Apple the company. The entropy that is found in the data set needs to be reasonable for the algorithm to function reasonably well and learning systems also need to be taught with training data, which may not be possible in this case.

Link Match: What we are then left with is to hit the problem purely by tracking outgoing links. This would thankfully involve a far less computationally intensive approach than going via the pure text analysis approach. The degree of accuracy and the utility this approach may have may not be stunning, but it would certainly be good enough for the immediate purpose – a reasonable way of classifying what my subscription list is talking about.

Related articles:

RSS Clustering: A Unique Approach for Managing Your RSS Feeds
A Novel Clustering-based RSS Aggregator
Nearest Neighbors and Similarity Search by Yury Lifshits


Written by shyam

January 15, 2008 at 9:00 pm

One Response

Subscribe to comments with RSS.

  1. this helps me understand why is so unbelievably bland and boring and lowest common denominator….

    there is simply a necessary averaging that has to go on, same with rss feeds, really, aggregators…

    it is like a software program that discontinues your insurance based only on an algorithm….

    we are a long way from intelligence in the system for anybody with a brain


    January 18, 2008 at 8:37 pm

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: