LayerVault Simple version control for designers. Try it free.

Revisiting the Activity Feed

The activity feed has been a great component of the LayerVault experience for 6 months now. We find ourselves using it every day to get a quick overview of what the team has been up to. However, one thing that has become apparent over time is that the activity feed is extremely verbose. It’s full of redundant data that makes it difficult to see what’s important at a glance. On top of that, we found that the activity feed skeleton quickly became an unscalable solution due to the sheer number of elements it added to the DOM. So how can we make it better?

Getting Smarter

The bottom line is that the activity feed needed to be significantly smarter. Instead of blindly spitting out each activity item, it would have to do some processing to figure out the best way of displaying the feed data. We spent a day doing research and found a fantastic presentation from Etsy on how they process their activity feed. We decided to use it as a guideline and adapt our own version.

Tackling Information Density

In order to increase information density, the activity feed combines multiple related events into a single story item by batching them together. Activity items are still stored in MySQL, but item generation is very write heavy. This is because not only does each user gets their own copy of each feed item, but we also want the ability to filter feed items based on the current context of the page. This makes it significantly easier to generate the finished feed. Regardless, processing the raw activity feed can be time and resource consuming, so all of this happens in the background outside of the webpage request.

Taming the DOM

Since we’re doing away with the skeleton, the activity feed has to be paginated in a sensible fashion. We can simulate the infinite scroll experience by heavily preloading activity feed data and making sure the server is extremely fast at spitting out activity feed pages. 100% of the time, activity feed HTML is rendered from data stored in Redis. The API call to fetch the activity page is also returned in HTML to lessen client-side processing in Javascript.

What is an Event?

It’s a funny question to ask, but we realized we had to break down what exactly defines an event in the context of LayerVault in order to proceed. We came up with a generic structure that defines every event on the site: every event has a subject, a verb, one or more objects, and some optional extra attributes. The subject is the person performing the event, the verb is the action that subject is taking, and the object is the item with which the action is being performed. This is a complex way of saying we’re making simple English sentences. Thinking about all of this may seem trivial, but it plays a big role in how we’ve structured our code. Rails and Ruby expose some great ways to handle a situation like this.

Creating an Event

Activity events can be created around a number of items on LayerVault. Files, folders, and users to name a few. Each of these items are already cleanly represented by models, so adding the activity feed functionality was surprisingly simple. In a language that supports multiple inheritance, you would make each of those models extend a new activity item class, but the Ruby way is to use mixins. The code looks something like this:

# ActivityItemable.rb
module ActivityItemable
  def generate_activity_item(action, actor, options={})
    # Add our events to the database
  end
end

# File.rb
class File < ActiveRecord::Base
  include ActivityItemable

  # ...
end

# FileController.rb
file.delay.generate_activity_item :modified, current_user

In order to help clarify activity item generation, there are two concepts we created called scopes and filters. Scopes define who should be able to see a given activity item. Filters define which objects the activity item should be attached to. The number of activity items generated comes out to be scopes x filters, because we create an item for each user for each filter.

scope.each do |owner|
  filters.each do |filter|
    # generate feed item based on the
    # owner and filter.
  end
end

This means that every file, folder, and project on LayerVault essentially has its own activity feed. Again, heavy writes and fast reads.

Single-Table Inheritance

We take advantage of a feature in Rails called single-table inheritance that allows us to cleanly separate the responsibilities of each activity item. Some activity types are very straightforward (Ryan created Test.psd), while others are a bit more complex (Ryan delivered Test.psd to Basecamp, Campfire, and Dropbox). Using single-table inheritance makes it much simpler to handle each of these situations separately.

Each feed item type has its own class that extends the base ActivityFeedItem class. Rails does some magic when reading and writing this data by automatically using the proper class for each specific activity type.

Processing the Feed

Processing the activity feed mainly revolves around the idea of batching. Creating a batch is actually deceptively simple. We loop through all of the activity items and compare the current item to the last one we just looked at and ask the question, “Is this event related to the last one?” If yes, then the items are combined together, otherwise the last batch is sent off for processing and a new batch is created.

current_batch = []
feed_items.each do |feed_item|
  unless feed_item.should_batch(last_item)
    batches.push current_batch
    current_batch = []
  end

  last_item = feed_item
  current_batch.push feed_item
end

While the criteria for determining relatedness differs from type to type, most of the time all we have to check is if the feed items describe the same action (created, modified, etc) and if their objects are of the same type (file, folder, etc).

Aggregation

Once all of the items are collected into these batches, we then send each batch off to the aggregator. The aggregator is responsible for processing the batched data and outputting the actual feed text, along with a few other important items.

There are 3 different aggregation types that we handle: no aggregation, item aggregation, and verb aggregation. No aggregation is the simplest, and simply means that the feed story consists of only 1 event. Item aggregation means that the batched feed items share a common action and object, for example: Ryan modified Test.psd 4 times. Verb aggregation means that the batched items only share a common action, for example: Ryan modified Test.psd, Homepage.psd, and Wireframe.psd. The beauty of this is the distillation of information into what’s important.

Text Processing

Once the actual sentence for each batch has been generated, our work isn’t quite done. Each file, folder, signpost, etc that’s mentioned in the feed story needs to be linked. We can’t link the entire story because multiple items could be batched into a single story, so this means that we’ll have to do some processing on the text itself. When the story text is generated, it’s actually generated using a few concepts from Markdown, which was chosen for its dead simple markup. A raw feed story looks like:

**Ryan L.** modified [Test.psd](/~/LayerVault/Test.psd) 3 times.

This allows us to generate 3 different copies of the story for various consumption mediums: the raw output shown above, an HTML version, and a plain text version. We also analyze the text to generate an array of “entities”, which is useful for JSON output.

In order to parse the entities properly, we had to do a little bit of fun math, but in the end it’s just regular expression based scanning:

URL_REGEX = /\[(.*?)\]\((.*?)\)/

def self.parse_entities(text)
  offset = 0
  entities = []

  text.scan(URL_REGEX) do |match|
    start = $~.offset(0)[0] - offset
    finish = $~.offset(0)[1] - offset - match[1].length - 4
    offset += 4 + match[1].length

    entities.push({
      begin: start,
      end: finish,
      text: match[0],
      url: match[1]
    })
  end

  entities
end

The final output of a feed story looks like:

"data": [{
    "ids": [12859, 12867, 12875, 12883],
    "date": "2012-12-13T18:59:15Z",
    "subject_id": 17,
    "subject_email": "ryan@example.com",
    "raw": "**Ryan L.** left feedback for [Test.psd](/permalink/kjhDFkjhf8) 4 times",
    "text": "Ryan L. left feedback for Test.psd 4 times",
    "html": "<span class='ActivityActorName'>Ryan L.</span> left feedback for <a href=\"/permalink/kjhDFkjhf8\">Test.psd</a> 4 times",
    "entities": [{
        "begin": 30,
        "end": 43,
        "text": "Test.psd",
        "url": "/permalink/kjhDFkjhf8"
    }]

Updating the Feed

Once we have a complete activity feed, all is well and good until we need to add a new event. Since it might be possible to batch this new event with the latest feed story, we have to reprocess a bit of data. Luckily, it’s only the latest page of activity data that gets reprocessed, since data older than that will never change.

Caching

Caching for the activity feed is done at two separate stages in order to make fetching pages as fast as possible. We use a simple cache key scheme to get the exact data we need. Redis is configured to run in append-only mode with writes every second.

Key Scheme

An activity feed cache key is based on a few items: the current user, the current filter (if any), and the current page (if any).

activity/#{owner.id}/#{filter.type}:#{filter.id}/#{page_id}

The latest page of activity data is not assigned a page number, but pages after it monotonically decrease. The cache data always stores the next page ID so we can easily paginate through them.

Data Caching

The data that is produced during the feed processing and aggregation is thrown into Redis for later retrieval. This is most important for the latest page of activity feed data, because its HTML is rendered as a part of the page request every time. Since we don’t have to hit the database for this, it’s impact on page load is minimal.

Possible Improvements

All of these changes are a big leap in the right direction for the activity feed, but there are a few improvements that we still have in mind.

Optimizing Feed Reprocessing

Right now, when a new event comes in, we reprocess the entire newest page of activity items. This doesn’t take terribly long, but in reality, we only need to consider the very latest activity item on the newest page until we implement story weighting.

Story Weighting

Right now all stories are considered equally important. In the future, we would like to be able to weight each type of story such that the most important actions are given greater precedence in the feed order.

Object Aggregations

If multiple actions are applied to the same object (or set of objects), we should be able to batch that as well. For example, Ryan created and modified Test.psd. The batching process currently does not handle this situation.

Realtime Events

Unfortunately, realtime events are currently disabled due to the new complexity of the activity feed. We definitely plan to add them back though in the near future.

DOM Node Caching

The new activity feed works great, even when scrolling back far into past events, but when you scroll really far back you end up with a lot of nodes being injected into the DOM. To fix this, nodes that are not visible should be removed from the DOM as you scroll, and reinserted when they need to be visible again. This complicates pagination considerably, and no one on LayerVault has a feed that goes back far enough yet to cause issues, which is why it didn’t make it into this version.

  • Ryan
  1. layervault posted this