When you add a web site like Flickr or Google Reader to FriendFeed, FriendFeed's servers constantly download your feed from the service to get your updates as quickly as possible. FriendFeed's user base has grown quite a bit since launch, and our servers now download millions of feeds from over 43 services every hour.
One of the limitations of this approach is that it is difficult to get updates from services quickly without FriendFeed's crawler overloading other sites' servers with update checks. Gary Burd and I have thought quite a bit about ways we could augment existing feed formats like Atom and RSS to make fetching updates faster and more efficient. Our proposal, which we have named Simple Update Protocol, or SUP, is below. You can read more details and check out sample code on Google Code. Discuss the proposal in the SUP FriendFeed room.
SUP is just a proposal at this stage. We are eager to get feedback and ideas, and we expect to update the protocol based on feedback over the next few months.
Simple Update Protocol
SUP (Simple Update Protocol) is a simple and compact "ping feed" that web services can produce in order to alert the consumers of their feeds when a feed has been updated. This reduces update latency and improves efficiency by eliminating the need for frequent polling.
- Simple to implement. Most sites can add support with only few lines of code if their database already stores timestamps.
- Works over HTTP, so it's very easy to publish and consume.
- Cacheable. A SUP feed can be generated by a cron job and served from a static text file or from memcached.
- Compact. Updates can be about 21 bytes each. (8 bytes with gzip encoding)
- Does not expose usernames or secret feed urls (such as Google Reader Shared Items feeds)
SUP is designed to be especially easy for feed publishers to create. It's not ideal for small feed consumers because they will only be interested in a tiny fraction of the updates. However, intermediate services such as Gnip or others could easily consume a SUP feed and convert it into a subscribe/push model using XMPP or HTTP callbacks.
Sites wishing to produce a SUP feed must do two things:
- Add a special
<link>tag to their SUP enabled Atom or RSS feeds. This
<link>tag includes the feed's SUP-ID and the URL of the appropriate SUP feed.
- Generate a SUP feed which lists the SUP-IDs of all recently updated feeds.
Feed consumers can add SUP support by:
- Storing the SUP-IDs of the Atom/RSS feeds they consume.
- Watching for those SUP-IDs in their associated SUP feeds.
By using SUP-IDs instead of feed urls, we avoid having to expose the feed url, avoid URL canonicalization issues, and produce a more compact update feed (because SUP-IDs can be a database id or some other short token assigned by the service).
Because it is still possible to miss updates due to server errors or other malfunctions, SUP does not completely eliminate the need for polling. However, when using SUP, feed consumers can reduce polling frequency while simultaneously reducing update latency. For example, if a site such as FriendFeed switched from polling feeds every 30 minutes to polling every 300 minutes (5 hours), and also monitored the appropriate SUP feed every 3 minutes, the total amount of feed polling would be reduced by about 90%, and new updates would typically appear 10 times as fast.
Update: Several people have asked how using SUP compares with using HTTP If-Modified-Since headers. The two features are complementary. With SUP, feed consumers can monitor thousands of feeds with a single HTTP request (to fetch the latest SUP document) instead of having to request each feed individually. For example, each user's feed on FriendFeed has a unique SUP-ID (mine is "53924729"), but all of the feeds point to a single SUP URL, http://friendfeed.com/api/sup.json. Therefore, it's possible to watch for activity on thousands of separate FriendFeed URLs by polling just one URL, http://friendfeed.com/api/sup.json. If my SUP-ID appears in that SUP document, then you know that my feed has updated and it's time to fetch a new copy. This is substantially more efficient than polling each of those thousands of URLs individually.