Comments on FriendFeed Blog: Simple Update Protocol: Fetch updates from feeds faster

Looks like my last vague idea was not only already...

2009-07-17T04:32:03.400-07:00

Looks like my last vague idea was not only already developed, but already implemented. Just finished reading a sentence of Louis Gray's post.

If I understand thus right we have a network effic...

2009-07-17T04:12:47.633-07:00

If I understand thus right we have a network efficiency problem and limitations based on data structure. Large providers may have millions or potentially billions of individual nodes that have a status update flag.
Current efficient polling is done by comparing their remote time stamp and the node's last update time.
Excuse me for the modest description this problem space is new to me (but important to what I'm working on).

Is there a viral hub heirarchy? This would be a limited number of remote servers who simply pass along the last update time to all subscribers (and any polling system could also serve last node update time). You would subscribe to a virtual server represented by a dynamic list of your nearest neighbors. A network similar to bittorrent but only concerned with last update time.

The storage could get problematic for large databases (even of a simple time) so each serving hub could keep what it needs for internal use and just push all the times to it's short list of clients.

I'm trying to think of a decentralized status architecture, would appreciate expert opinions

whats wrong with having a simple atom “update” fee...

2008-09-10T05:03:00.000-07:00

whats wrong with having a simple atom “update” feed listing recently changed feeds? that feed could use RFC5005 and consumers wouldn’t need to worry about polling faster than the SUP feed turns over completely

This is cool, but what about the xmpp pubsub? Isn...

2008-09-04T13:39:00.000-07:00

This is cool, but what about the xmpp pubsub? Isn't that what a lot of people are recommending now? What are your thoughts on this?

Here's an extended version of Atomstream.py with X...

2008-09-03T11:52:00.000-07:00

Here's an extended version of Atomstream.py with XMPP support (it's buggy and may crash after some time).

The script turns ATOM feeds into XHTML-IM messages.

Might be fun to add a track capability with Pusbub, though. But I digress.

As regards data exchange, XMPP Pubsub looks like an interesting solution, considering for example that contrarily to the SixApart ATOM stream, Friendfeed servers would only receive messages from subscribed XMPP Flikcr accounts, not all the data which would avoid filtering.

SUP looks interesting but so 20th century.

But what do you do when you're polling thousands o...

2008-09-03T10:05:00.000-07:00

But what do you do when you're polling thousands of SUP feeds individually? Obviously you'll need a way to see if any of the feeds in any of the SUPs you are tracking have updated in order to save even more bandwidth. Hmm... maybe we need another protocol ;-)

http://www.aaronsw.com/2002/atomstream/atomstream....

2008-09-02T17:55:00.000-07:00

http://www.aaronsw.com/2002/atomstream/atomstream.py

curl -N -s http://updates.sixapart.com/atom-stream...

2008-08-29T14:38:00.000-07:00

curl -N -s http://updates.sixapart.com/atom-stream.xml | grep --line-buffered "application/atom+xml" | grep --line-buffered 'rel="self"'

I'm not sure SUP would help much for a large scale...

2008-08-29T09:07:00.000-07:00

I'm not sure SUP would help much for a large scale generator of feed-based activity like Netflix without some notion of subscription resulting in SUP feed per feed consumer. The personalized Netflix feeds generate generate about 6 million posts per day (about 2M each of queue adds, shipped DVDs and received DVDs). Given that any given feed consumer is likely only interested in a small fraction of the 8.4M+ subscribers, the signal-to-nose ratio in a general SUP feed would be quite low.

Mike D, other:Re: multiple databases. First, you c...

2008-08-28T15:34:00.000-07:00

Mike D, other:

Re: multiple databases. First, you could serve multiple SUP feeds corresponding to your multiple databases. Second, if a client like FriendFeed or Gnip is hitting all your feeds, generating a single SUP feed is provably easier on your servers: you could poll your own RSS feeds on the server generating the SUP.

re: "In an RSS architecture, a system must generate output for only the users that are actively being polled." SUP is useful when two conditions are satisfied: a lot of your feeds are being hit by a single client, and many of those feeds are not updated during the polling interval. SUP is not designed for the use case you're talking about.

Re: "it appears there is no way to actually fetch the contents of an update without knowing the URL that originally led you to a SUP-ID" Yes, that's the point.

Re: push solutions. Maintaining and producing/consuming an open connection is harder for both ends.

Atom feeds solved this problem a long time ago, as...

2008-08-28T13:00:00.000-07:00

Atom feeds solved this problem a long time ago, as evidenced at six apart with http://updates.sixapart.com

This solution is neither simple nor does it take into account years of prior art of solving this problem through push.

I like this idea, simple and elegant. I agree that...

2008-08-28T12:33:00.000-07:00

I like this idea, simple and elegant. I agree that constant polling sucks and is a terrible solution for what services like (friendfeed or Feedheads are doing), however, I struggle to imagine something like this being promptly adopted by most feed producers.

Google Reader's shared feed doesn't even make use of IF-Modified-since headers. They're google!!

I'm still having trouble envisioning what a provid...

2008-08-28T08:56:00.000-07:00

I'm still having trouble envisioning what a provider's implementation of this SUP feed would look like. MikeD makes good points about federated databases making this far more complicated and expensive. Paul Watson rightly points out that this feed would be gigantic. If it were hypothetically to contain 80,000 items, Flickr's would completely turn over every 25 minutes or so.

Rather than continuing to complain endlessly, I'd like to hear from FriendFeed themselves how they would go about creating a SUP stream for FriendFeed itself. I want to be convinced that this is a valid easy-way-out of the much longer road of popularizing XMPP.

oh god... this is not a "simple idea that no ...

2008-08-28T08:13:00.000-07:00

oh god... this is not a "simple idea that no one thought of before". This is weblog.com's changes.xml (courtesy of Dave Winer) all over again.

In fact, the inherent scaling problems in such a solution led to the RSS <cloud>-Element which attempted to solve the problem at the source, but through XML-RPC, which is why the cool kids don't support it.

(That and that it breaks if you get 1000s of timeouts from unavailable RPC endpoints)

Also: what Mike D said in his 2nd post.

His first one missed the point. HTTP HEAD is not going to solve this problem.

Please take this back to the drawing board. You're right, this problem needs a solution.

But the solution will not be simple, because it's a really complex problem.

Does FriendFeed even run a ping server?

Is there a document somewhere with the format of t...

2008-08-28T07:35:00.000-07:00

Is there a document somewhere with the format of the <link> tags and the SUP feed? Examples aren't enough.

So Flickr would have one SUP feed that lists any r...

2008-08-28T06:08:00.000-07:00

So Flickr would have one SUP feed that lists any recently updated feeds? That could be 80,000 items, right? Is that going to scale?

Guys, what do you think of Atom streams: http://up...

2008-08-27T23:59:00.001-07:00

Guys, what do you think of Atom streams: http://updates.sixapart.com/ ?

Your first and third benefits incorrectly assume t...

2008-08-27T23:59:00.000-07:00

Your first and third benefits incorrectly assume that all information resides in a single database, which is not the case for any site large enough to benefit from your proposal. Most large architectures work on the concept of sharding. Groups of systems that handle a smaller subset of the overall user population, without a need to know about each other. Unless I am misunderstanding some vital component, what you are asking providers to do is create a single point of aggregation that must know about all updates within all shards. This in itself creates a far more complex engineering problem than an end-site making 200 requests a second (which remember, in a sharded architecture, scales by adding a few extra queries each to a number of autonomous systems).

SUP would also multiply the volume of raw computing work that needs to be done by a data provider. In an RSS architecture, a system must generate output for only the users that are actively being polled. In a "SUP" system, resources must now be spent to generate output for all users, regardless of if anyone cares to come looking for it. Your claim that it reduces load on both ends is false, the only reduction in work is on the consuming end.

Your claims of not exposing secret feed URLs is also incorrect, as it appears there is no way to actually fetch the contents of an update without knowing the URL that originally led you to a SUP-ID.

I do not dispute that it works over HTTP, or that it is compact.

I don't see why it would not work, but still, it f...

2008-08-27T22:56:00.000-07:00

I don't see why it would not work, but still, it feels like another layer of duct tape.

Why insist on HTTP? Those 43 services that you pool constantly are the one who have the engineering resources to do this right.

Even Livejournal did it better 2 or 3 years ago with the TCP-based ping stream.

I understand that you have a problem to solve and you are looking for a quick way out, but really, do it right and move to a stream based solution, TCP, XMPP, other, take your pick.

Very simple and clever. Friendfeed-like!

2008-08-27T21:29:00.000-07:00

Very simple and clever. Friendfeed-like!

I like this very much!Did you think of using the e...

2008-08-27T21:06:00.000-07:00

I like this very much!

Did you think of using the existing RSS format for the list of feeds that are updated instead of the new json SUP format?

Probably because spitting a feed of updated items ...

2008-08-27T19:27:00.000-07:00

Probably because spitting a feed of updated items is a 20 minute job, while implementing XMPP would be a major PITA.

Here's my comment from the Read Write Web article:...

2008-08-27T18:45:00.000-07:00

Here's my comment from the Read Write Web article:

"""
"On July 21st, 2008, FriendFeed crawled Flickr 2.9 million times to get the latest photos of 45,754 users of which 6,721 of that 45,754 visited Flickr in that 24 hour period, and could have 'potentially' uploaded a photo."

Source: http://www.slideshare.net/kellan/beyond-rest (Slide 16)

RSS is simply not going to get us there.
"""

Given that this still requires the services y'all are scraping to Do Something, why not become the killer-app for XMPP?

sounds a bit like newnews for nntp did to help new...

2008-08-27T18:26:00.000-07:00

sounds a bit like newnews for nntp did to help news traverse to the various nntp servers faster. Here's a crazy thought maybe a new http verb "poll" in combination with if-modified-since. if used against a single feed it would limit the transmitted items only to new items. If used against a domain it could return modified pages/feeds for that domain. It could even increase the speed for crawlers for search engines. the accept header could even limit what types of updates are returned via poll.

This is one of those simple ideas that one wonders...

2008-08-27T18:17:00.000-07:00

This is one of those simple ideas that one wonders why no one thought of before. It's a good proposal and a required one in the rapidly growing aggregation/Lifestreaming world. I am sure the proposal will be widely and quickly SUPported.