Blue Screen Of Duds

Where the alter ego of codelust plays

Posts Tagged ‘google reader

Why not being number one in non-search areas is a-ok with Google

with one comment

A lot of people constantly harp on the fact that Google is not the market leader in any other market segment than search and contextual advertising and hold it up as proof of the fact that Google is a one-trick-pony. While it is desirable for Google to lead the market in everything it gets into, it is not the only factor that Google is looking for when it kicks off a new product.

To understand why Google does things differently, you need to first understand how goes Google work differently as a company.

At its core, Google is one massive computing infrastructure. What the company excels is in building, and maintaining applications on top of this infrastructure, only parts of which are known to us as Big Table, Google File System and Map Reduce. Almost every application (yes, this is speculation, sue me) is built atop this infrastructure, giving Google the ability to have consistency across storage, classification, categorization of any data that comes into its system. Other companies, like Yahoo! and Microsoft, have years and years of legacy sitting on different frameworks and infrastructure, giving Google amazing leverage over them.

For Google, the only real product is user experience and the value the user derives from using Google’s products. This, in turn, helps further refine and better offerings across the plate for Google, creating an endlessly iterative and self-improving product ecosystem. And the products by themselves are a means to bettering the end-product of user experience.

For instance, not many would have much to say about Google’s “web history”, but not many know that the same is used to do drive recommendations in Google Reader, which also uses geographical data (I was recommended feeds related to Trivandrum after being there for a week) which Google collects in conjunction with ISPs (driven by Google Analytics) to further refine these recommendations.

In a similar manner, Google already tracks the clicks that originate from Gmail and I would not be surprised if they are already tracking and indexing the thousands of billions of messages that flow across Google Talk to better know and predict which link you are likely to click more on Tuesdays and Wednesdays (match data from the messages and your web history), compared to Friday or Sunday. And that is very much in line with their mission statement of being more useful to you, in a manner that borders on the eerie quite a few times.

And that is where the greatest challenge lies for companies that aim to compete with Google. Learning systems that improve itself iteratively with time and usage are hardest to beat, because it improves by using you against yourself (something like going against your best time in a racing game than against a pre-programmed computer run) and since Google has been around for such a long time, the amount of data it has about you is something that the competition can’t match unless a vast majority of Google’s users switch overnight to the competing services.

Which brings us back to the non-search problem. Google really does not need to be the number one in other areas (other than the silly acquisitions like Jaiku). It does not cost Google much to create new products (many Google projects like Reader and News were started as 20% time projects) and it does not cost them anything to run those either (they are written with the same framework that is maintained for their core offerings). So, even if all of them were to fail, it would not make a dent on Google, while the fact is that a lot of them don’t.

Now, add Google App Engine to the mix, which opens up the same infrastructure (leveraging the same Google Accounts identity system) to the wider web. With the App Engine, for the tiny cost of supporting the bootstrap process for free, Google now gets even more focussed and specific data regarding usage(in the hierarchy of usage quality, context is king. Apps would have a context that is locked-down taking out the guesswork for Google and the data that is stored in such contexts would also be in a format that Google natively understands).

It would really be stupid to assume that all these processes and data collection is not already being  used to improve the advertising business, which is from where they earn their bread.

p.s: This post has been edited for clarity and a couple of grammatical snafus from its first version.

Written by shyam

April 22, 2008 at 4:29 pm

Posted in Google

Tagged with , ,

Nine steps to becoming a Google Reader Ninja

with 3 comments

  • Do this only when you a bit of good time to spare, don’t rush through it.
  • Mark all individual feeds that have more than 100 unread times as “all read.” You are likely to spend a lot of time working through this and getting very little real value out of it.
  • Reduce top level clutter: Keep as few folders on your top level as you can. My total boils down to nine, ordered in terms of reading frequency (daily, india-blogs, links, misc, music, news, private, technology, testing).
  • If you need to organize your feeds in a more granular manner, use sub-folders. Do this only if you are an organisation freak. What has worked best for me is the following method.

/root

|-Top Level Category (By frequency first and by usage type later)

|-Sub Category (By theme/topic: Technology, Business, Blogs)

  • Always read using the “River of News” view (Folder Level view) on Top Level folders
  • Find your comfort level in terms of number of items you can read in a fixed period of time and switch to List View if the items are above a fixed number (I keep it at 100).
  • Star lengthy items that need more reading time, catch up on them later.
  • Frequently prune your subscription lists: Check your reading trends regularly. Unsubscribe from feeds that are below a certain read percentage in subscription trends. Follow that up with with the same treatment done on the reading trends.
  • My average reading percentage is 30% for my Top 40. If you have the same numbers, it is a good idea then to let go of the feeds that have less 30% reading percentage. Chances are you won’t miss them because you don’t read them much anyway.

Happy Reading!

Written by shyam

January 21, 2008 at 3:25 pm

Clustered river of news

with one comment

RSS readers have over time become pretty fully-featured software on their own. Most now provide the standard set of features: OPML import/export, categories, river of news and search irrespective of their avatar — online or offline — and I have pretty much grown used to depending on my reader of choice Google Reader to satisfy the need to read my feeds.

That said, there is one feature I’d really love to have in my RSS reader – to have clustering on feeds as an additional way to categorise data, other than the current methods of categories and tags. Think of it as a cross between your RSS reader and Google News/Techmeme. Would it not be nice to have your little personal Google News or Techmeme from the sources that you have picked than be led by what Gabe or the kind folks at Google News may have seeded their websites with?

There are, though, a couple of problems that could make this impossible:

Processing: Any algorithm that finds similarities in text is computationally intensive even in cases where the data set is limited. Scaling is often possible in such circumstances when the size of the data set is reasonably fixed and with the variance that comes in the size of different RSS subscription lists, it would be a royal pain to find a right algorithm that will scale effectively and efficiently.

Entropy: Traditional similarity match approaches work best when they cover a similar domain so that an apple would mean apple the fruit rather than Apple the company. The entropy that is found in the data set needs to be reasonable for the algorithm to function reasonably well and learning systems also need to be taught with training data, which may not be possible in this case.

Link Match: What we are then left with is to hit the problem purely by tracking outgoing links. This would thankfully involve a far less computationally intensive approach than going via the pure text analysis approach. The degree of accuracy and the utility this approach may have may not be stunning, but it would certainly be good enough for the immediate purpose – a reasonable way of classifying what my subscription list is talking about.

Related articles:

RSS Clustering: A Unique Approach for Managing Your RSS Feeds
A Novel Clustering-based RSS Aggregator
Nearest Neighbors and Similarity Search by Yury Lifshits

Written by shyam

January 15, 2008 at 9:00 pm