System Design

How to Approach a System Design

Reference

Step 1: Outline use cases, constraints, and assumptions

Who is going to use it?
How are they going to use it?
How many users are there?
What does the system do?

Step 2: Create a high level design

Step 3: Design core components

SQL or NoSQL
Database schema

Step 4: Scale the design

Load balancer
Horizontal scaling
Caching
Database sharding

Messaging Application

Key Features

One-on-one-chat and group chats between users.
Allow users to send text, pictures, videos and other files.
Store chat history.
Show the online/offline status of users.
Sent/delivered/read notifications.

Key Requirements

The service should be able to handle millions of users, with around >10B messages sent each day.
Real-time chatting with minimum latency.
The messaging service should be highly available, but consistency is of higher priority than availability in this case.
The system needs to be highly consistent. In other words, all users should see messages in the same order through any device they login from.

Server-Client Connection

Communication between the server and the client:
Long Polling and WebSockets.
Client communicates with service using HTTP - widely used protocol.

Receive Messages

Pull Approach

The receiving client periodically requests the server for any new messages.
The server maintains a queue of messages for the client and as soon as the client makes a get request, the server will send all the pending messages in the queue for that client.
To minimize delay in receiving messages, the receiving client needs to make requests to the server at a high frequency, even if many of these requests are returned with no new messages for the client.
With requests and responses at such high frequencies sent for each client, the system will ask for a lot of resources, making it infeasible for a messaging app.

Push Approach

The active clients will maintain a connection with the server in the form of a thread or a process and will wait for the server to deliver them any new messages that appear for the client through that open connection.
Since the server immediately relays all incoming messages to the clients, the server will not need to handle any pending messages and the overheads involved in establishing a new connection will be eliminated.
This approach lowers the latency to a minimum, allowing for a faster chat between clients, which is exactly what we want in your messaging system.
Note: push approach is a more fitting option since it involves minimum resources and minimum latency.

Storing Online/Offline Status

Users’ statuses are also stored in addition to messages.
However, status is just the memory cache.
To know if the user is online or offline or when the user was ‘Last Seen’, we need a heartbeat that the online users periodically send to the server.
The server continues updating the heartbeat timestamp in a table stored in the memory cache.
All the active users will have an entry in this table with a timestamp value against their user ID.
For example if user B wants to know A’s status, the server can read A’s timestamp in the table to check if he/she is online or offline and display it to B.

Send/Receive Notification

A sends a message for B via a server.
Because the connection will follow some messaging protocol, for example TCP, an acknowledgment will be sent to A when the server receives the message.
The server sends the message to the database for storage and also to the open connection for user B.
When user B receives the message, it sends an acknowledgment to the server.
The server sends a notification to A that the message has been delivered to B.
This is the delivered notification as shown in the chat or channel.
As soon as B opens his/her phone and sees the message, an acknowledgment will need to be sent to the server.
Now, this acknowledgment does not need to be sent to the same server through which the message was sent.
The read notification may be treated as an independent message with a unique message ID.
The notification will be sent over a server to A. and A sees this message as let’s say double blue tick.

Group Chats

Each group chat is considered as a unique object and is assigned a GroupChatID.
If user A sends a message in a group with 1 as the GroupChatID. The load balancer will look up the server that handles group 1 and sends the message to that server.
The server has a service that handles group messages and it queries this service for the other users that are also a part of the group chat.
The group service maintains a table that stores all the members of each GroupChatID as you can see in the picture.

Online Learning

Search and Ranking (Slack)

Ranking of candidate results from input query (TFIDF, Word2Vec)
Things to think about when ranking:
- If a user shares channels/interactions with specific authors, give them higher rankings.
- We can rank by actions like number of replies, reactions, or recency

Method

Feature Engineering

Security: Can’t leak any information across companies/channels/etc.
- To help account for this, we can create non-private attributes.
- Example: is user ‘A’ a member of channel XYZ?
“Work Graph”
- Affinity between searcher and author/channel
- Message-level engagement stats
  - Some might be more frequently engaged with
- Search-specific engagement rates
- Text Features
- Recency
- Solr Signals

Offline and updates to the Solr Rank Service

Labels (e.g. Relevance or Ranking)
API sends results of queries and other logs to storage
Capture metrics like whether or not the user clicked or engaged with the candidates returned from the search results (relevance or click rate).
Data is sent to a Data Warehouse with Airflow job to join data

Model Training

Model: XGboost
Single Model:
- Does not train a model per team (very sparse; won’t work on small teams; not enough data).

Success Metrics

Clicks alone might be difficult to determine success (could mean different things) Add surveys using Slack bots to determine if the returned results provided what they were looking for.

Evaluation

Updates to the rankings are performed offline and evaluated by:
- Mean reciprocal rank
- Mean average precision
Generally works well with online data
Perform A/B testing
- Control/Treatment,
- Statistical Power (how many users do we need to get statistical power).

Online Architecture

Offline Architecture

Predicting User Engagement (Slack)

The engagement problem is formulated as a logistic regression using features such as summary statistics capturing information about the message’s author, content, channel, reactions, and interactions.
The output of the regression is a vector of feature weights that can be fed into the logistic function to produce values between 0 and 1.
We can interpret these values as the probability of a given message triggering a particular engagement from a particular user.
Features: view time, clicks, time since following channel, #hashtags, tagging, file/content uploads, engagement with other users, user/author meta info.
- Affinity Scores as Features
- To predict the likelihood that a user engages with a message, we employ a measure of the user-to-author affinity (defined roughly as the general propensity of the user to read the author’s messages) as well as a measure of the user’s priority for the channel in which the message was posted.
- In addition, we create personalized features for each class of message engagements, weighted by the affinities of the users who engaged.
- These features allow the model to capture predictive signals like people being more likely to reply to a message that several close colleagues have already replied to.
Class imbalance: Since most users engage with only a tiny fraction of the messages they see in Slack, we’re faced with a class-imbalance when training our regression model.
- To combat this problem, we employ a number of proven techniques.
- We use stratified sampling to create training sets, picking an engaged message from a channel and then selecting a fixed number of unengaged ones from the surrounding conversation to preserve a fixed ratio of positive to negative examples.

Relevancy

Relevancy, but after trying a few different proxy metrics, we ultimately settled on an approximation of time spent in channel based on read and write activity.

Search Engine

Database Schema
- Tables:
- User: UserID/TimeStamp
- Search: Query/UserID
- Articles/Docs: SearchTerms/Keywords

Note

Scaling: if we have large amounts of documents, we cannot fit all in one database and will need to shard the documents.

Learn to Rank (Wayfair)

Company uses LTR coupled with machine learning and natural language processing (NLP) techniques to understand a customer’s intent for queries.
Extract text information from different datasets including user reviews, product catalog, and clickstream.
Next, we can use a variety of NLP techniques to extract entities, analyze sentiments, and transform data.
Then feed the results into a microservice to identify user intent on a large portion of incoming queries.
When the Intent Engine can’t make a direct match, they use the keyword search model.

Autocomplete

Reference