Mokum architecture: timelines and access control

Published by @squadette on 2016-11-19

In this chapter we’ll discuss how Mokum controls access to posts.

The "posts" table schema has no surprises: id, user_id, text, created_at/updated_at timestamps and few more fields which we will discuss in a chapter on data consistency.

Primary feeds in Mokum are called “timelines”. The ‘timelines’ table has very simple schema:

CREATE TABLE timelines (
id INTEGER,
their_id INTEGER,
what VARCHAR(255),
created_at TIMESTAMP
);

The following timelines are defined for each user:

  • user feed;

  • private subfeed of user;

  • “For Your Eyes Only” posts of user;

  • direct messages sent by user;

  • direct messages received by user;

  • posts likes by user;

  • posts commented by user;

  • posts faved by user;

  • posts hidden by user;

Each group has associated timeline with posts posted to a group.

There are also several special aggregating timelines:

  • timelines for ‘best of’ posts (per language);

  • timelines for all posts (per language);

  • timeline for ‘most faved’ posts (per language);

There is also a timeline for each search query. We omit several timelines from this list for now for the sake of brevity.

Object associated with timeline is specified by tuple of (what, their_id). For example, timeline for ("user_comments", 20) corresponds to user with id 20. Timeline ("group", 30) corresponds to a group with id 30. Timeline ("best_of", 4) corresponds to a pseudo-object “best posts written in Turkish (language id 4)”.

Posts and timelines are joined with ‘timelines_entries’ table (the name is a bit unfortunate, later we will find out why). Schema for that table looks like this:

CREATE TABLE timeline_entries (
timeline_id INTEGER,
post_id INTEGER,
created_at TIMESTAMP,
fresh_at TIMESTAMP
);

This is the largest table in Mokum, both by number of rows and by physical size, and it is heavily indexed: [post_id, timeline_id], [timeline_id, created_at] and [timeline_id, fresh_at]. There are plans to split this table in two, but we’ll discuss this later.

Suppose we have a post written by Alice, liked by Bob and Carol and commented by Carol, Dave and Alice. Also, it is faved by Eugene and it is written in English.

This post will be associated with eight timelines:

  • (user, alice_id);
  • (user_likes, bob_id);
  • (user_likes, carol_id);
  • (user_comments, carol_id);
  • (user_comments, dave_id);
  • (user_comments, alice_id);
  • (user_favs, eugene_id);
  • (everything, 1);

Timelines are never accessed directly, they are always filtered by access control mechanism and materialized in secondary timelines (so called “rivers”). We’ll discuss rivers in the following chapter.

Some timelines types are “decisive”, other are not. Only decisive timelines are used for access control. The following timelines are decisive: a) user primary feed; b) private sub-feed; c) FYEO; d) directs sent by user; e) directs received by user; f) group timelines, and several others omitted for brevity.

All other timeline types are non-decisive: user comments, likes, favs, “everything”, “best of”, “search”, etc.

Suppose Frank clicks on a link to the post we described before. How do we check if he can see this post? We fetch a list of decisive timelines, which in this case has only one element: ("user", alice_id). User has access to a post if he has access to any of the decisive timelines associated with the post. That is, if Alice posted to her public feed and to a private group which Frank has no access to, Frank will still see here post. If she posts to two private groups and Frank was not subscribed to both of them, he will not see the post, because access to both group timelines is not allowed to him.

Another example: suppose Lena sends direct message in Italian to Michael and Nick. This post will appear on four timelines:

  • ("user_directs_sent", lena_id);

  • ("user_directs_received", michael_id);

  • ("user_directs_received", nick_id);

  • ("everything", 3).

This post will be visible to all three of them, because they are obviously allowed to see their own sent/received timelines. Because of that, they (and only they) will see this message in /filter/everything/it.

Suppose Carol comments Bob’s post (it is added to her user_comments timeline), and then later Bob changes his feed to private, and Carol is not subscribed. In that case even if Carol looks at her own comments page (/carol/comments), this post is not accessible to her because she has no access to Bob’s decisive user feed timeline.


2015-2016 Mokum.place