Blue Screen Of Duds

Where the alter ego of codelust plays

Archive for the ‘Technology’ Category

Clustered river of news

with one comment

RSS readers have over time become pretty fully-featured software on their own. Most now provide the standard set of features: OPML import/export, categories, river of news and search irrespective of their avatar — online or offline — and I have pretty much grown used to depending on my reader of choice Google Reader to satisfy the need to read my feeds.

That said, there is one feature I’d really love to have in my RSS reader – to have clustering on feeds as an additional way to categorise data, other than the current methods of categories and tags. Think of it as a cross between your RSS reader and Google News/Techmeme. Would it not be nice to have your little personal Google News or Techmeme from the sources that you have picked than be led by what Gabe or the kind folks at Google News may have seeded their websites with?

There are, though, a couple of problems that could make this impossible:

Processing: Any algorithm that finds similarities in text is computationally intensive even in cases where the data set is limited. Scaling is often possible in such circumstances when the size of the data set is reasonably fixed and with the variance that comes in the size of different RSS subscription lists, it would be a royal pain to find a right algorithm that will scale effectively and efficiently.

Entropy: Traditional similarity match approaches work best when they cover a similar domain so that an apple would mean apple the fruit rather than Apple the company. The entropy that is found in the data set needs to be reasonable for the algorithm to function reasonably well and learning systems also need to be taught with training data, which may not be possible in this case.

Link Match: What we are then left with is to hit the problem purely by tracking outgoing links. This would thankfully involve a far less computationally intensive approach than going via the pure text analysis approach. The degree of accuracy and the utility this approach may have may not be stunning, but it would certainly be good enough for the immediate purpose – a reasonable way of classifying what my subscription list is talking about.

Related articles:

RSS Clustering: A Unique Approach for Managing Your RSS Feeds
A Novel Clustering-based RSS Aggregator
Nearest Neighbors and Similarity Search by Yury Lifshits

Advertisements

Written by shyam

January 15, 2008 at 9:00 pm

Subvert and profit: Bush greenflags Chinese high tech transfer

leave a comment »

This has to be one rather interesting and hilarious development. Apparently, the Bush administration has happily worked around a treaty that was meant to restrict high technology transfers to China and has allowed IBM to transfer some cutting edge technology to the nation.

It seems that the Bush White House has quietly relaxed the restrictions imposed by Wassenaar by saying there are some approved companies, operating in China, which can import technologies without a license.

So far the US has approved five companies to be in this select group, four of which are semiconductor related companies, National Semiconductor’s Chinese facilities, Applied Materials’ Chinese facilities, the Shanghai Hua Hong NEC Electronics Company and SMIC.

The government and the corporations are strange bedfellows in a free market (if there was ever an oxymoron, free market will get the first spot each and every time) and this once again proves that costs are the only benchmark for corporations to follow and eventually all such costs work back into the system via the government.

There is a larger picture to this, that the semiconductor firms are trying desperately to increase margins and cut costs and China is one of the cheaper and more reliable places to put up fabs. And SMIC is one of the largest fabs who basically do white label manufacturing for the other manufacturers like IBM.

To increase their margins, the fabs need to incorporate the latest and greatest technology and agreements like Wassenaar stand in the way of such things being made possible.

For us, in India, this is a welcome development. While we are a long way off from being able to provide the guaranteed infrastructure to support multiple fabs, at some point in the future we should see high technology manufacturing slowly making its way into India and with prior-art like what just happened being already in place, life should only be easier.

Now, only if someone would actually start building that infrastructure.

Written by shyam

January 3, 2008 at 8:31 pm

Google’s Opera Mini killer

with 4 comments

Peter Cranstone, while pondering how the Google phone will deliver ads to its users, says that Google will have to do something similar to what Opera does with Opera Mini — transcoding web pages — for the Google phone. He adds that once Google gets around to doing this it will beat the crap out of Opera Mini, which probably won’t find much agreement with Russell Beattie, who argues that someone should buy Opera just for the traffic that is now routed through Opera Mini.

What both gentlemen are probably not aware of is that Google already has a transcoder that converts pages into mobile-formatted on the fly. Now, rather strangely, the interface is not available anywhere as a start page as far as I know. Google does serve you a mobile-specific Google.com page depending on your User Agent, but the links that are delivered in the results page do not use the transcoder.

The only place where you can see it is if you use the mobile version of the Google Reader. In the entry-level screen on Google Reader, there is a link that says “see original,” which can also be accessed by pressing ‘0’ on your mobile phone. To access any normal page on your desktop browser via this transcoder, all you have to do is to append the URL you want to browse to the following URL: http://www.google.com/gwt/n?u=. For example this blog can be accessed this way: http://www.google.com/gwt/n?u=https://fatalerror.wordpress.com.

Currently, the transcoder supports most standard HTML, including forms, which means that you get to access things like email on the go even on a very low-fi handset, and also that Google gets another bit of your personal information (did I hear the privacy paranoid let out a collective gasp there?) for it to index and profile. The good part of the story is that it refuses to transcode secure URLs, which I remember was not the case with Opera Mini.

Now, here I also have to admit here that Opera Mini does a stellar job, but it also has a problem that you need to have J2ME support to be able to use it. Besides, the Google transcoder seems to be considerably faster while transcoding and rendering pages. For all you know, Google maybe licensing Opera’s technology to do this (imagine: Opera Mini kills Opera Mini. What a headline!), but from what I remember Opera is running a mightily hacked up version of the Opera browser as middleware to make Opera Mini possible, while Google’s approach seems to be in line with the more standard HTML Tidy/HTML Cleaner/HTML Parser/Tagsoup approach to de-mucking web pages, albeit a monstrously hacked version of it.

Written by shyam

August 31, 2007 at 5:47 am

WordPress.com feature request: External syndication feed provider support

with 2 comments

A while ago, WordPress.com took down the ‘Feed Stats’ module that used to help the users see traffic the blog’s feed was getting and the break up in terms of the various clients. Going by the responses (210 comments on that thread), it was something that was missed by quite a few and it was a page I used to heavily rely on.

Ever since Google Reader started the practice of reporting subscription numbers in its User Agent header, a lot of other web-based clients have started doing the same (I had checked it against our internal server logs and a majority of them do the same) and it was a good way of seeing actual and sustained usage in terms of the feeds. While the Feed Stats module never reported subscription numbers, it still used to give us a fair idea about how much the usage used to be, even if was not exactly uniques or absolute uniques.

Now, it is entirely possible to use something like Feedburner to do it, but the problem is that since the free WordPress.com offering does not allow for template editing, you can’t point the autodiscovery links elsewhere other than the WordPress.com feed. Moreover, even in cases where you can edit the templates, clients that are looking at the old feed URLs need to be served a 301 redirect to point to the right URL, which is not possible right now.

Even though Matt has commented that the feature may return someday, it is very important for people who track their traffic closely to have this information and making this feature available would make our lives considerably easier and WordPress.com that much better.

Written by shyam

August 26, 2007 at 8:39 am

Posted in Technology, Wordpress

End of India as an outsourcing destination

with 3 comments

Munjal Shah of Riya recently wrote about how the company is moving its base back to the valley, citing wage inflation as the primary reason why they are moving out. If the responses are anything to go by, it has not been a popular move. I think it is a bit unfair to judge companies regarding where they want to operate from on the basis of toeing the line as far as the latest *sourcing/*shoring trend and whatever bits of nationalism you might want to apply to it.

We operate in a profit-driven scenario, where your operating costs can eat into whatever revenue you might have at any point in time. Unless you have the luxury of being a situation where there is going to be a considerable volume of cash flow to meet your additional expenses, which is not coming from your revenues, you have to always keep the opex/capex variable under control. From that point of view, I think it is totally a company’s prerogative to decide where they want to operate from.

India, that way, stands at a very strange juncture. In every sector, from BPOs to garments, we don’t occupy the excellent value for money pedestal in the lower end of the value chain anymore. That, in a way is good, because it would force us to move upwards in the chain, offering something better than ‘cheapness’ as compelling value proposition. The companies who can see that and adapt accordingly now itself are going to do well in the long run, the others will fade away with time.

I still remember my first job, with a web software development firm during the heights of the first bubble. They had a solid client base and decent products, but the owner steadfastly refused to move up the value chain, preferring to go for the easier bang for the buck approach. The end result was that, once the bubble burst, they were on their knees, being unable to retain any of their top talent because they were not doing the kind of work the good guys wanted to do and they also had a hard time keeping the mouths fed because spending disappeared across the board everywhere and what was left was a couple of AMCs for the lower-end work, which did not bring in much money anyway.

In terms of development, we badly need to move out of the mindset that most companies have here. It is insanely hard to find professionally run firms here, other than 10% of any sector. On the web front most are still busy copying what’s already been done-to-death in the west, leading to developers thinking only in terms of copying or being ‘inspired,’ which leads to decrements in value the add to the business.

Eventually, the wages, at least in technology, will go up to a level where in pure cost terms we won’t have any advantage over any market you can think of, necessitating other reasons why people should stick with us and from what I’ve seen, we are still a long way from being able to come up with some pretty compelling reasons.

Written by shyam

April 30, 2007 at 1:20 pm

Posted in India, Technology

Farewell F2o

with one comment

 I received an email in my mailbox today that pretty much marks the end of the naive years of the internet. Excerpt:

It’s with a great amount of reluctance and sadness that after five years of providing high quality, advertisement free hosting to thousands of
people around the world, I’m announcing the end of freedom2operate. — Daniel J. Cody

F2o was one of those last free hosting service providers to outlast the recent post-boom years on the internet. They were different in the sense that it was a nicely supported and awesomely-featured hosting service (Chillisoft ASP too!) providers who did not take on anybody and everybody onboard. The idea was to have a community on the platform, consisting of tinkerers and web developers, who were provided with features that most paid hosting accounts would hesitate to provide. And as it came to be, the show could not go on forever, even with the addition of the paid hosting accounts.

I used to have an account on F2o, but I did not bother to ping DJC when they did a server migration that required existing users to opt to migrate to the new boxes. I think I was among the unfortunate few who did have problems with the migration, that needed to be manually fixed, but I decided to not opt for it and let the account die, mostly because I’d come to the conclusion that quality services need to be supported with money and bandwidth, rack space and the effort that goes into keeping something like this running is never free and should never be free.

I do not know who else is left in the space now. Evolt used to provide such a service, but I am no longer sure what exactly is going on there. I think most of this type of free hosting will remain a faint memory, other than fly-by-night operators who are looking to make a quick buck by injected all hosted pages with pop up and Google Ads.

Technorati tags: , , ,

Written by shyam

April 26, 2007 at 8:15 am

Posted in /etc, Technology

Life in the browser

leave a comment »

Okay, before I start, I need to say one thing. If we are looking to live our lives inside the browser, it has to manage memory (yes, I am looking glaring at you Firefox) a zillion times better. My normal laptop usage is to never switch it off or restart it for days on end. During the daily commute it is set to hibernate and once I am home or in office, I pick up from where I left off.

Now, I’ve had Firefox running for 2 days now and it has eaten up a whopping 476 MB of physical memory. I really don’t give a damn whether it is the 20 extensions which I have that is bleeding my laptop of these resources. If it is the extension model that is one of the core value propositions of the Firefox platform, there needs to be a solution that will fix this problem. Asking me to ditch the extensions is not a solution. Honestly, 500 MB of RAM is what fairly graphics intensive games takes up on PCs these days, it is not something any self-respecting browser should ever have to consume.

Back to the title of the post. My switch to Google Reader has progressed in a manner considerably better than what I’d expected. One very good positive from the switch is that it has saved me a lot of bandwidth. Leaving GreatNews on overnight often would cause me to pull around 100 MB worth of data (that does not include any podcasts), whether I end up reading any of the items or not. Using Google Reader treats the feeds pretty much like IMAP email, you get the ‘unread’ count from Google in the left pane, but you don’t download the items till you click on them. And for some strange reason I quite like the ‘river of news’ view in Google Reader than in GreatNews.

More later.

Technorati tags: , ,

Written by shyam

April 21, 2007 at 8:30 am

Posted in Blogs, Technology