Blue Screen Of Duds

Where the alter ego of codelust plays

Archive for the ‘Blogs’ Category

Decentralized Social Data Framework: A Modest Proposal

with 3 comments

Twitter being down is no longer funny, nor is it even news anymore and the same is the case with Twitter-angst, where loyal users fret and fume about how often it is down. One of the interesting suggestions that have come out as a result of this is to create a decentralized version of Twitter – much on the lines of IRC – to bring about much better uptimes for the beleaguered child of Obvious Inc.

I would take the idea a lot further and argue that all social communication products should gradually turn into aggregation points. What I am proposing is a new social data framework, let us call it HyperID (since it would borrow and use heavily ideas and concepts from OpenID), from which social media websites would subscribe, push and pull data from.

Essentially, this would involve the publication of the user’s social graph as the universal starting point for services and websites to subscribe to, rather than the current approach where everyone is struggling to aggregate disparate social graphs as the end point of all activities. Ergo, we are addressing the wrong problem at the wrong place.

The current crop of problems will only be addressed when we stop pulling data into aggregators and start pushing data into service and messaging buses. Additionally, since this data is replicated across all subscriber nodes, it should also provide us with much better redundancy.

Problem Domain 

Identity: Joe User on Twitter may not always be the same as Joe User on Facebook. This is a known problem that makes discovery of content, context and connections tricky and often downright inaccurate. Google’s Social Graph API is a brave attempt at addressing this issue using XFN and FOAF, but it won’t find much success because it is initiated at the wrong end and also because it is an educated guess at the best and you don’t make those with your personal data or connections.
 
Disparate services: Joe User may only want to blog and not use photo sharing on the same platform, unlike Jane User who uses an entire gamut of services. In an even worse scenario, if Jane User wants to use blogs on a particular service provider (say, Windows Live Spaces) and photo sharing on another (Flickr, for instance), she will have to build and nurture different trust systems, contacts and reputation levels.

Data retention: Yes, service providers are now warming up to the possibility of allowing users to pull out user data from them, but it is often provided without metadata or data that is accrued over time (comments, tags, categories etc). Switching providers often leaves you with having to do the same work all over again.

Security: Social information aggregators now collect and save information by asking you for passwords and usernames on other services. This is not a sane way to work (extremely high risk of phishing) and is downright illegal at times when it involves HTML scraping and unauthorized access.

Proposed solution

Hyperid Layout

Identity, identity, identity: Start using OpenID as the base of HyperID. Users will be uniquely addressable by means of URLs. Joe User can always be associated with his URL (http://www.joeuser.com/id/), independent of the services he has subscribed to. Connections made by Joe User will also resolve to other OpenIDs. In one swipe you no longer have to scrape or crawl or guess to figure out your connections.
 
Formalize a social (meta)data vocabulary: Existing syndication formats like RSS and ATOM, are usually used to publish text content. There are extensions of these formats like Media RSS from Yahoo!, but none of them address the social data domain. 

Of the existing candidates, the Atom Publishing Protocol seems to be the most amenable to an extension like this to cover the most common of social data requirements. Additional and site-specific extensions can be added on by means of custom namespaces that define them.

You host your own social graph: With a common vocabulary, pushing, pulling and subscribing to data across different providers and subscribers should become effortless. This would also mean that you can, if you want to, host your own social graph (http://www.janeuser.com/social) or leave it up to service providers who will do it for you. I know that SixApart already does this in part with the Action Streams plugin, but it is still a pull than a push service.

Moreover, we could extend the autodiscovery protocol for RSS and use it to point to the location of the social graph, which is a considerably better and easier solution than the one proposed Social Graph.

Extend and embrace existing tech: Extend and leverage existing technologies like OpenID and Atom to authenticate and advertise available services to users depending on their access levels.

What this could mean

For companies: They have to change the way they look at usage, data and their own business models. Throwing away locked-in logins would be a scary thing to do, but you get better quality and better-profiled usage.

In the short run you are looking at existing companies changing themselves into data buses. In the longer run, it should be business as normal.

Redundancy: Since your data is replicated across different subscribers, you can push updates across to different services and assign fallbacks (primary subscriber: twitter, secondary: pownce and so on).

Subscriber applications can cache advertised fallback options and try known options if the primary ones are unavailable. 

For users: They will need to sign up with a HyperID provider or host one on their own if they are savvy enough to do that. On the surface, though, it should all be business as usual, since a well-executed API and vocabulary should do the heavy lifting behind the scenes.
 
The Opportunity

For someone like WordPress.com, diversifying into the HyperID space would be a natural extension. They could even call it Socialpress. The hypothetical service would have a dashboard like interface to control your settings, subscriptions and trusted users and an API endpoint specific to each user.

Risks

Complexity: Since data is replicated and pushed out across to different subscribers, controls will be granular by default and across different providers this could prove to be very cumbersome.

Security: Even though attacks against OpenId has not been a matter of concern, extending it would bring with it the risk of opening up new fronts in what is essentially a simple identity verification mechanism.

Synchronization: Since there is data replication involved (bi-directional like any decent framework should do), there is the possibility that lag should be there. Improperly implemented HyperID compliant websites could in theory retain data should be deleted across all subscribed nodes.

Traction: Without widespread support from the major players the initiative just won’t go anywhere. This is even more troublesome because it involves bi-directional syncing and all the parties involved are expected to play nice. If they don’t, it just won’t work. We could probably get into certification, compliance and all that jazz, but that would make it insanely complicated.

Exceptions: We are assuming here that users would want to aggregate all of their things under a single identity. I am well aware of the fact that there are valid use cases where users may want to not do that. HyperID does not prevent from doing. In fact, you could use different hyperIDs, or even specify which services you don’t want to be published at all.

Feedback

The comment space awaits you!
 
p.s: Apologies for the crappy graphic to go with the post. I am an absolute newbie on Omnigraffle and it shows! 

Written by shyam

February 4, 2008 at 1:46 pm

Three (good) Indian blogs that you probably don’t read

with 15 comments

Time to post something different for a change than the regular longwinding ones.

Shashikant’s tiny world: He posts across a wide variety of topics covering technology, the economy and other aspects of current affairs. He has something that is desperately lacking in most of the Indian blogosphere: a perspective that is not a wannabe version of the popular western ones. If only he would change the circa 2001 blogger template to something more contemporary, but I can hardly complain since I read the full feed in an RSS reader. And, no, I am not linking to him because he’s linked to me.

Gopal Vijayaraghavan: Gopal works for Yahoo! and is one the leads for the PHP APC cache. That does not mean he posts only about profiling PHP code and race conditions in it. He also writes about movies (from a very non-critic and normal viewer point of view) and a lot of other non-tech related things. The only minus point is that he does not allow comments on his blog.

Cleartrip blog/Hrush: I know this is a corporate blog, probably disqualifying it from being considered as a normal blog. But most of the content on the blog is penned by Hrush Bhatt, Founder & Director, Product and Strategy for the company. Other than the fact the blog is one of the best and the most open blogs among Indian corporates, he also gets additional brownie points from me for quoting two bloggers in the web data sphere that I follow closely: Danny Ayers and Joe Gregorio. Minus point: Not updated frequently enough.

While on the topic of blogging, I was wondering recently if the only major difference that blogging has brought to the platform is that being biased is no longer uncool? These days, I tend to switch off from any discussion that aims to figure out the biases of mainstream media. The fact of the matter is that everything and every human being is biased and we are conditioned by our biases.

The only difference is that it used to be cool to claim that you were unbiased as a media entity. When you report from the field, you are supposed to stick to the facts and not colour it with your biases. I think this is quite badly misplaced. While, as a blogger, it is cool for you to be biased. In fact, you are encouraged to come clean on your biases than cover it with a veil faux neutrality. Other than that, if you take out the scale and economics of the matter, there is hardly any differentiation: Both sides have bad reporting, band language and myopia to the obvious.

Interesting and hypothetical over-the-top question of the day: Can you imagine a newspaper filled with op-ed writers?

Technorati tags: , , ,

Written by shyam

August 29, 2007 at 9:26 am

Posted in Blogs, India, social media

Paging Messers Page and Brin: Please shut down Orkut

with 3 comments

Two of Google’s worst products in its product line up are Orkut and Blogger. There are various reasons why those two deserve that label, but when a company worth billions, with more PhDs than you anyone can count on its rolls puts up a notice that says, “Security tip: Never paste a URL or script into your browser while logged into orkut.com, no matter what it claims to do,” it really does not get any worse than that. Google, please do yourself and your users a favour and shut the damn thing down till you fix it.

orkutbig

Apparently there has been a spate of recent Google Account hijackings that don’t follow any particular pattern. There is a fairly high probability that the warning on Orkut has something to do with one of the twin curses of Web 2.0: a CSRF or an XSS attack. Orkut handles its authentication and cookies differently from the rest of the Google framework.

You can log into Orkut and also be logged into other Google products like Google Reader and Gmail without being prompted to authenticate yourself again when you browse to those products. Conversely, if you log into the other two and browse over to Orkut, you will be faced with the authentication prompt.

In all probability, Orkut is using another cookie of its own in addition to the Google account cookie and somewhere in between a malicious script is hijacking the Google account cookie, using the cross domain permissions that are granted to Orkut pages to do the initial authentication on the GLogin.aspx page. In any case, Google should have fixed the problems with Orkut than to expect users not to paste a URL or a script into the browser while they are logged into the website.

Google’s greatest strength is its computing framework (one that even Microsoft will take a lot of time to catch up with its ‘cloud’ initiative), where applications basically plug into Big Table and GFS, requiring relatively smaller teams of developers to sustain and develop the lesser-important products; Orkut and Blogger belong to that category. After all, since when does getting an Ajax button to post a comment or having product blog (OMG! We have a blog now, we are so 2005!) or having dynamic pages on a blog network represent significant advances in the history of humanity?

The trouble is that the same strength works as Google’s major weakness too. Since they don’t need massive teams to deploy and sustain these applications, the products don’t get the attention that’s required and function mostly on autopilot. And unlike what most people think, Google does not really care much about being a segment leader as long as they can mine usage data, do behavioral analysis and use that to improve the advertising cash cow. But that does leave holes like these open, which is just not done and I hope Google fixes the holes soon before someone figures out a Orkut-wide attack.

p.s: Get someone to fix the language in the warning. It almost sounds like they are urging users not to use Orkut irrespective of what the site claims to do.

Technorati tags: , , ,

Written by shyam

August 3, 2007 at 10:44 am

Posted in Blogs, Google, security

Dude, who stole my page views?

leave a comment »

Why does traditional media, including the online ones, struggle so much against user generated content (yes, as a matter of principle and whatever else you can hate the phrase for all you like) and non-traditional media generated content? Beyond the fact that most of the Youtubes are full of content produced by the traditional monsters (note to traditional media: your content discovery model is toast, find a new one or have one shoved down your throat with excessive force), the nagging little problem for the oldies is the cost structure. It is just way too expensive to produce the content they produce, while the average Joe with an N95 on the street might end up producing something that will blow you out of the water on any given day.

Once up on a time, the only way to produce and distribute content online was to pump in a lot of money. Everything would cost a fortune: the wire copies that are required to pump in the non-unique stories, the reporters who would create the unique content and the editors who would package and publish it all online. Web hosting was expensive, technical support was expensive, there was no Blogger, WordPress or even Django and to publish content online you would probably have to create and maintain a content management system of your own.

Before it became the favourite pond of sploggers, Blogger, along with a host of other publishing websites decimated the high entry cost to publishing, personal or otherwise. Of course, the quality of content on the average blog was not quite at par with something that came from a media house’s stable. Where they scored was in terms of width and variety. 200 bloggers on the same platform covers a much wider area than 20 of the best editors sitting together and the monopoly was broken.

All you need now to get published is a domain, a web hosting account and the ability to use a browser, totaling something less than $130 for a year. And if you go about it in a smart way, you can easily break even and even generate differing degrees of profit. This is why you have slew of new online publications that don’t do much of end-to-end original content. It is much easier and cheaper to let the big guys do the heavy lifting; they will just latch on to it and provide that little bit of insight and background, which is mostly not allowed in a normal reportage. There is really nothing wrong with that model, it is an opportunity that the traditional media model has brought about. Nobody should feel any shame in making a living off it.

The average mainstream online media publication (the ones that publish 24 hours or close to it) employs something in the region of 20 – 30 people in production to keep the show going. Mind you, that number does not guarantee any exclusive content; they can comfortably cover most of the day’s events, but creating exclusive content is an additional effort on top of that. Making things even more difficult for them are recent developments like Google’s refusal to index wire copies from publications anymore, while the agencies themselves charge you extra for online usage rights. It suddenly reduces the footprint these publications have on aggregators and search engines.

So, is it really that a Paidcontent or a Gawker Media will be the New York Times of 2010? I am afraid not. They do use a marginally leaner model of generating content, but they are still based on the flawed old model, which is too costly to run and it just does not scale. This is just an interim where there is a bit of arbitrage in terms of the cost vs revenue equation for the new guys, compared to the old ones, but it is not going to last forever and this is certainly not the future.

Technorati tags: , ,

Written by shyam

July 29, 2007 at 11:33 am

Posted in Blogs, Media, social media

WordPress.com makes a million, blogs that is

with 6 comments

Congrats to the lads at Automattic on their anytime-now millionth sign up on WordPress.com. The company is the dark horse in the entire blog hosting business and they quietly go about doing their thing. And other than the long-forgotten misdemeanor of placing some icky ads on the WordPress homepage, Matt’s hardly put a foot wrong in recent times. And for a ‘virtual’ company (there is no real office these guys work from), they have accomplished some amazing feats:

1) Use cheap hardware and mostly open source software to deliver performance and reliability that far outstrips any expensive solution out there. The list of software used reads something like this: Debian/Ubuntu, PHP, MySQL, Litespeed, PoundWackamole, Spread, Nagios, Munin, Monit, NFS, Postfix, MyDNS. How often have you read about an unplanned outage on WordPress.com? You can get an idea of their set up at Barry’s blog.

2) Scaling out PHP and MySQL to support an operation of this scale. Okay, it is not quite the LAMP stack (Litespeed instead of Apache, read Matt’s comments here on the subject), but it does make the case that PHP, when done right can do incredibly well, both in terms of scale and performance, which is only augmented by this post by Steve Grimm on the Memcached list about Facebook’s architecture.

3) WordPress.com also has what could arguably be called the best user contributed text content repository online at this moment. There is such a wide variety of content — from technology to adult — that is of a pretty good quality. Excellent and pro-active policing of illegal content has also ensured that there is little spam within the network, a problem that is a growing problem on Blogspot.

With 7.5 million daily page views and 45 million unique visitors, they look more like an acquisition target by any of the big boys who would be drowning in a pool of drool of envy after seeing those numbers. Any guesses as to who they might be talking to/have already turned down?

In any case, way to go guys and keep the good stuff pouring in!

Written by shyam

May 16, 2007 at 8:52 am

CEO 2.0

with 2 comments

After the unconventional ways of Jonathan Schwartz (Sun’s CEO), who’s kept up his blogging efforts even after being bumped up to the post rather admirably (albeit in a less interesting manner ever since) and Alan Meckler (CEO, Jupiter Media) who is not half an interesting a blogger as the former is, we now have Mårten Mickos (CEO, MySQL AB) battling the hordes in the very unfriendly waters of Slashdot. I quite like the openness in these conversations, though I have to wonder how long it would be (basically around the time till they list) before Mårten would also be snowed under the numerous directives issued by the attorneys and the shadow of the infamous SOX. It is an interesting thread to follow all the same.

Technorati tags: , , ,

Written by shyam

April 26, 2007 at 1:57 pm

Posted in Blogs

Life in the browser

leave a comment »

Okay, before I start, I need to say one thing. If we are looking to live our lives inside the browser, it has to manage memory (yes, I am looking glaring at you Firefox) a zillion times better. My normal laptop usage is to never switch it off or restart it for days on end. During the daily commute it is set to hibernate and once I am home or in office, I pick up from where I left off.

Now, I’ve had Firefox running for 2 days now and it has eaten up a whopping 476 MB of physical memory. I really don’t give a damn whether it is the 20 extensions which I have that is bleeding my laptop of these resources. If it is the extension model that is one of the core value propositions of the Firefox platform, there needs to be a solution that will fix this problem. Asking me to ditch the extensions is not a solution. Honestly, 500 MB of RAM is what fairly graphics intensive games takes up on PCs these days, it is not something any self-respecting browser should ever have to consume.

Back to the title of the post. My switch to Google Reader has progressed in a manner considerably better than what I’d expected. One very good positive from the switch is that it has saved me a lot of bandwidth. Leaving GreatNews on overnight often would cause me to pull around 100 MB worth of data (that does not include any podcasts), whether I end up reading any of the items or not. Using Google Reader treats the feeds pretty much like IMAP email, you get the ‘unread’ count from Google in the left pane, but you don’t download the items till you click on them. And for some strange reason I quite like the ‘river of news’ view in Google Reader than in GreatNews.

More later.

Technorati tags: , ,

Written by shyam

April 21, 2007 at 8:30 am

Posted in Blogs, Technology