Blue Screen Of Duds

Where the alter ego of codelust plays

Archive for the ‘Technology’ Category

Clustered river of news

with one comment

RSS readers have over time become pretty fully-featured software on their own. Most now provide the standard set of features: OPML import/export, categories, river of news and search irrespective of their avatar — online or offline — and I have pretty much grown used to depending on my reader of choice Google Reader to satisfy the need to read my feeds.

That said, there is one feature I’d really love to have in my RSS reader – to have clustering on feeds as an additional way to categorise data, other than the current methods of categories and tags. Think of it as a cross between your RSS reader and Google News/Techmeme. Would it not be nice to have your little personal Google News or Techmeme from the sources that you have picked than be led by what Gabe or the kind folks at Google News may have seeded their websites with?

There are, though, a couple of problems that could make this impossible:

Processing: Any algorithm that finds similarities in text is computationally intensive even in cases where the data set is limited. Scaling is often possible in such circumstances when the size of the data set is reasonably fixed and with the variance that comes in the size of different RSS subscription lists, it would be a royal pain to find a right algorithm that will scale effectively and efficiently.

Entropy: Traditional similarity match approaches work best when they cover a similar domain so that an apple would mean apple the fruit rather than Apple the company. The entropy that is found in the data set needs to be reasonable for the algorithm to function reasonably well and learning systems also need to be taught with training data, which may not be possible in this case.

Link Match: What we are then left with is to hit the problem purely by tracking outgoing links. This would thankfully involve a far less computationally intensive approach than going via the pure text analysis approach. The degree of accuracy and the utility this approach may have may not be stunning, but it would certainly be good enough for the immediate purpose – a reasonable way of classifying what my subscription list is talking about.

Related articles:

RSS Clustering: A Unique Approach for Managing Your RSS Feeds
A Novel Clustering-based RSS Aggregator
Nearest Neighbors and Similarity Search by Yury Lifshits

Written by shyam

January 15, 2008 at 9:00 pm

Subvert and profit: Bush greenflags Chinese high tech transfer

without comments

This has to be one rather interesting and hilarious development. Apparently, the Bush administration has happily worked around a treaty that was meant to restrict high technology transfers to China and has allowed IBM to transfer some cutting edge technology to the nation.

It seems that the Bush White House has quietly relaxed the restrictions imposed by Wassenaar by saying there are some approved companies, operating in China, which can import technologies without a license.

So far the US has approved five companies to be in this select group, four of which are semiconductor related companies, National Semiconductor’s Chinese facilities, Applied Materials’ Chinese facilities, the Shanghai Hua Hong NEC Electronics Company and SMIC.

The government and the corporations are strange bedfellows in a free market (if there was ever an oxymoron, free market will get the first spot each and every time) and this once again proves that costs are the only benchmark for corporations to follow and eventually all such costs work back into the system via the government.

There is a larger picture to this, that the semiconductor firms are trying desperately to increase margins and cut costs and China is one of the cheaper and more reliable places to put up fabs. And SMIC is one of the largest fabs who basically do white label manufacturing for the other manufacturers like IBM.

To increase their margins, the fabs need to incorporate the latest and greatest technology and agreements like Wassenaar stand in the way of such things being made possible.

For us, in India, this is a welcome development. While we are a long way off from being able to provide the guaranteed infrastructure to support multiple fabs, at some point in the future we should see high technology manufacturing slowly making its way into India and with prior-art like what just happened being already in place, life should only be easier.

Now, only if someone would actually start building that infrastructure.

Written by shyam

January 3, 2008 at 8:31 pm

Google’s Opera Mini killer

with 4 comments

Peter Cranstone, while pondering how the Google phone will deliver ads to its users, says that Google will have to do something similar to what Opera does with Opera Mini — transcoding web pages — for the Google phone. He adds that once Google gets around to doing this it will beat the crap out of Opera Mini, which probably won’t find much agreement with Russell Beattie, who argues that someone should buy Opera just for the traffic that is now routed through Opera Mini.

What both gentlemen are probably not aware of is that Google already has a transcoder that converts pages into mobile-formatted on the fly. Now, rather strangely, the interface is not available anywhere as a start page as far as I know. Google does serve you a mobile-specific Google.com page depending on your User Agent, but the links that are delivered in the results page do not use the transcoder.

The only place where you can see it is if you use the mobile version of the Google Reader. In the entry-level screen on Google Reader, there is a link that says “see original,” which can also be accessed by pressing ‘0′ on your mobile phone. To access any normal page on your desktop browser via this transcoder, all you have to do is to append the URL you want to browse to the following URL: http://www.google.com/gwt/n?u=. For example this blog can be accessed this way: http://www.google.com/gwt/n?u=http://fatalerror.wordpress.com.

Currently, the transcoder supports most standard HTML, including forms, which means that you get to access things like email on the go even on a very low-fi handset, and also that Google gets another bit of your personal information (did I hear the privacy paranoid let out a collective gasp there?) for it to index and profile. The good part of the story is that it refuses to transcode secure URLs, which I remember was not the case with Opera Mini.

Now, here I also have to admit here that Opera Mini does a stellar job, but it also has a problem that you need to have J2ME support to be able to use it. Besides, the Google transcoder seems to be considerably faster while transcoding and rendering pages. For all you know, Google maybe licensing Opera’s technology to do this (imagine: Opera Mini kills Opera Mini. What a headline!), but from what I remember Opera is running a mightily hacked up version of the Opera browser as middleware to make Opera Mini possible, while Google’s approach seems to be in line with the more standard HTML Tidy/HTML Cleaner/HTML Parser/Tagsoup approach to de-mucking web pages, albeit a monstrously hacked version of it.

Written by shyam

August 31, 2007 at 5:47 am

WordPress.com feature request: External syndication feed provider support

with 2 comments

A while ago, WordPress.com took down the ‘Feed Stats’ module that used to help the users see traffic the blog’s feed was getting and the break up in terms of the various clients. Going by the responses (210 comments on that thread), it was something that was missed by quite a few and it was a page I used to heavily rely on.

Ever since Google Reader started the practice of reporting subscription numbers in its User Agent header, a lot of other web-based clients have started doing the same (I had checked it against our internal server logs and a majority of them do the same) and it was a good way of seeing actual and sustained usage in terms of the feeds. While the Feed Stats module never reported subscription numbers, it still used to give us a fair idea about how much the usage used to be, even if was not exactly uniques or absolute uniques.

Now, it is entirely possible to use something like Feedburner to do it, but the problem is that since the free WordPress.com offering does not allow for template editing, you can’t point the autodiscovery links elsewhere other than the WordPress.com feed. Moreover, even in cases where you can edit the templates, clients that are looking at the old feed URLs need to be served a 301 redirect to point to the right URL, which is not possible right now.

Even though Matt has commented that the feature may return someday, it is very important for people who track their traffic closely to have this information and making this feature available would make our lives considerably easier and WordPress.com that much better.

Written by shyam

August 26, 2007 at 8:39 am

Posted in Technology, Wordpress

End of India as an outsourcing destination

with 3 comments

Munjal Shah of Riya recently wrote about how the company is moving its base back to the valley, citing wage inflation as the primary reason why they are moving out. If the responses are anything to go by, it has not been a popular move. I think it is a bit unfair to judge companies regarding where they want to operate from on the basis of toeing the line as far as the latest *sourcing/*shoring trend and whatever bits of nationalism you might want to apply to it.

We operate in a profit-driven scenario, where your operating costs can eat into whatever revenue you might have at any point in time. Unless you have the luxury of being a situation where there is going to be a considerable volume of cash flow to meet your additional expenses, which is not coming from your revenues, you have to always keep the opex/capex variable under control. From that point of view, I think it is totally a company’s prerogative to decide where they want to operate from.

India, that way, stands at a very strange juncture. In every sector, from BPOs to garments, we don’t occupy the excellent value for money pedestal in the lower end of the value chain anymore. That, in a way is good, because it would force us to move upwards in the chain, offering something better than ‘cheapness’ as compelling value proposition. The companies who can see that and adapt accordingly now itself are going to do well in the long run, the others will fade away with time.

I still remember my first job, with a web software development firm during the heights of the first bubble. They had a solid client base and decent products, but the owner steadfastly refused to move up the value chain, preferring to go for the easier bang for the buck approach. The end result was that, once the bubble burst, they were on their knees, being unable to retain any of their top talent because they were not doing the kind of work the good guys wanted to do and they also had a hard time keeping the mouths fed because spending disappeared across the board everywhere and what was left was a couple of AMCs for the lower-end work, which did not bring in much money anyway.

In terms of development, we badly need to move out of the mindset that most companies have here. It is insanely hard to find professionally run firms here, other than 10% of any sector. On the web front most are still busy copying what’s already been done-to-death in the west, leading to developers thinking only in terms of copying or being ‘inspired,’ which leads to decrements in value the add to the business.

Eventually, the wages, at least in technology, will go up to a level where in pure cost terms we won’t have any advantage over any market you can think of, necessitating other reasons why people should stick with us and from what I’ve seen, we are still a long way from being able to come up with some pretty compelling reasons.

Written by shyam

April 30, 2007 at 1:20 pm

Posted in India, Technology