WhatsApp Image 2024-02-14 at 16.31.02

Decoding the Tech Battle: Monoliths and Microservices clash in the Digital Arena

Monolith:

Advantages:

  1. Single Codebase: All components of the app exist in one codebase.
  2. Easy Development: Simpler to add new features.
  3. Easy Testing: Easier to simulate and test scenarios.
  4. Easy Deployment: Easy to deploy the entire platform.
  5. Easy Debugging: Easier to trace bugs.
  6. Easy Performance Monitoring: Easier to monitor performance of all features.

Disadvantages:

  1. Slower Development Speed: As the system grows, adding new features can slow down.
  2. Scalability Issues: Scaling issues can arise when the user base grows.
  3. Reliability: A bug in one part can bring down the entire system.
  4. Flexibility Issues: Can’t add a feature if it requires a different tech stack.
  5. Deployment Complexity: Small changes require complete deployment.

Microservices:

Advantages:

  1. Agile Development: Faster development, can update services independently.
  2. Scalability: Can scale only necessary services.
  3. Highly Testable & Maintainable: Each microservice can be tested and maintained separately.
  4. Flexibility: Different services can use different technology stacks.
  5. Independent Deployment: Each microservice can be deployed independently.

Disadvantages:

  1. Management Maintenance: Managing multiple services can be complex.
  2. Infrastructure Costs: Can increase due to separate databases and servers for each service.
  3. Organizational Issues: Communication challenges can arise among teams working on different services.
  4. Debugging Issues: Requires advanced tools for efficient debugging.
  5. Lack of Standardization: Different standards among services can create integration issues.
  6. Lack of Code Ownership: Potential issues in shared code areas due to divided responsibilities.

Monolith to microservices migration:

Strangler Fig Pattern

The Strangler Fig Pattern can be an effective method for migrating a monolithic system to microservices. Here’s how it can be applied to stock market application:

  1. Identify Part to Migrate: Identify a part of the existing system that needs migration. For instance, we may choose the ‘Buy/Sell transaction’ functionality in the monolith application that we wish to replace.
  2. Implement New Microservice: Implement this functionality into a new microservice. We could create a new ‘Transaction Service’ that handles all the buy/sell operations independently.
  3. Redirect Requests: Start gradually redirecting the requests from the old monolithic system to the new ‘Transaction Service’. This can be done using a routing mechanism, which routes a specific portion of requests to the new service. This allows the new microservice to start handling real-world requests while also giving a chance to monitor its performance and correct any issues before it fully takes over the functionality from the monolith.

Branch By Abstraction Pattern

  1. Abstraction: Identify the ‘Buy/Sell transaction’ part of the monolithic system. Create an interface called ‘TransactionService’ that defines the operations like ‘buy’ and ‘sell’. The existing monolith codebase would implement this interface.
  2. New Implementation: Now, start developing the new microservice which will also implement the ‘TransactionService’ interface. This new microservice is designed to handle the ‘Buy/Sell transaction’ operations independently.
  3. Switch: Once the microservice is ready and thoroughly tested, gradually start redirecting the ‘Buy/Sell transaction’ requests from the monolithic system to the new microservice. This could be accomplished through feature toggles or a routing mechanism, which allows you to control which requests are processed by the new microservice.
  4. Remove Legacy Code: When the new microservice has fully taken over the ‘Buy/Sell transaction’ operations and is working as expected, the legacy ‘Buy/Sell transaction’ code in the monolith system can be safely removed.

Branch by Abstraction allows this transition to happen smoothly, without disrupting the functioning of the system. The old and new systems can coexist and operate in parallel during the transition, reducing risks and enabling continuous delivery.

Remote Procedure Call (RPC):

A Remote Procedure Call (RPC) is similar to a function call, but it’s used in the context of networked applications. It allows a program running on one machine to call a function on a different machine (a remote server) as if it were a local function.

For example, consider a client-server application where the server provides a function to add two numbers. But instead of calling this function locally, a client on a different machine can use RPC to call this function on the server.

gRPC:

gRPC is a modern, open-source, high-performance RPC framework developed by Google. It uses Protocol Buffers (protobuf) as its interface definition language, which describes the service interface and the structure of the payload messages. This is an efficient binary format that provides a simpler and faster data exchange compared to JSON and XML.

Protocol Buffers:

Protocol Buffers (Protobuf) is a binary encoding format that allows you to specify a schema for your data using a specification language. This schema is used to generate code in various languages and provides a wide range of data structures that result in serialized data being small and quick to encode/decode.

gRPC in Action:

Here’s a simplified version of how gRPC works in a client-server architecture:

  1. gRPC Client: The process starts from the gRPC client. The client makes a call through a client stub, which has the same methods as the server. The data for the call is serialized using Protobuf into a binary format.
  2. Transport: The serialized data is then sent over the network via the underlying transport layer.
  3. HTTP/2: gRPC utilizes HTTP/2 as its transport protocol. One significant benefit of HTTP/2 is that it allows multiplexing, which is the ability to send multiple streams of messages over a single, long-lived TCP connection. This reduces latency and increases the performance of network communication.
  4. gRPC Server: The server receives the serialized data, deserializes it back into the method inputs, and executes the method. The result is then sent back in the reverse direction: serialized and sent back to the client via HTTP/2 and the transport layer, then deserialized by the client stub.

The adoption of gRPC in web client and server communication has been slower due to a few significant factors:

  1. Browser Compatibility: Not all browsers fully support HTTP/2, the protocol underlying gRPC. Even where HTTP/2 is supported, the necessary HTTP/2 features such as trailers might not be available.
  2. gRPC-Web: While gRPC-Web, a JavaScript implementation of gRPC for browsers, does exist, it doesn’t support all the features of gRPC, such as bidirectional streaming, and is less mature than other gRPC libraries.
  3. Text-Based Formats: In the context of web development, formats like JSON and XML are very common and convenient for data interchange. They’re directly compatible with JavaScript and are human-readable. gRPC, on the other hand, defaults to Protocol Buffers, a binary format that’s more efficient but not as straightforward to use on the web.
  4. Firewalls and Proxies: Some internet infrastructure might not support HTTP/2 or might block gRPC traffic, causing potential network issues.
  5. REST Familiarity: REST over HTTP is a well-understood model with broad support in many programming languages, frameworks, and tools. It’s simpler to use and understand, which can speed up development and debugging.
  • Increased Complexity: While gRPC has performance benefits, it also adds complexity to the system. The performance gain might not always be worth the added complexity, particularly for applications that don’t require high-performance inter-service communication.

Webhooks and Event-Driven Architecture:

Webhooks are a method of augmenting or altering the behavior of a web page or application with custom callbacks. These callbacks can be maintained, modified, and managed by third-party users and developers who may not necessarily be affiliated with the originating website or application.

In the context of a stock market application like Zerodha, this translates to the following:

Zerodha, a brokerage platform, wants to stay updated with price changes from the stock exchange (SEBI). To achieve this, Zerodha would provide a webhook, essentially a callback URL, to SEBI. This URL is designed to be hit whenever the specific event of interest, such as a particular stock reaching a certain price, occurs.

This is an example of an Event-Driven Architecture where communication happens based on events, rather than constant polling or maintaining a persistent connection.

Here’s the sequence of steps in more detail:

  1. Register: Zerodha first registers a webhook with SEBI. This is a callback URL that Zerodha exposes and asks SEBI to call when a certain event happens. In this case, when a particular stock price reaches a specified value.
  2. Trigger Event: When the stock price reaches the specified value, the event is triggered on the SEBI side.
  3. Invoke Webhook: SEBI then sends an HTTP request (usually a POST request) to the registered webhook URL provided by Zerodha. The request would contain information about the event in its body, typically formatted in JSON or XML.
  4. Receive and Process: Zerodha receives the HTTP request and processes the data contained in the body of the request. Based on the information received, it can take necessary action, such as notifying the user about the price change.

This event-driven method allows efficient communication and helps Zerodha stay updated with real-time changes in stock prices. It avoids the need for long polling and persistent connections, which could be expensive and not scalable when dealing with millions of clients.

Other examples:

  1. CI/CD Deployment Actions
  2. MailChimp
  3. Zapier
  4. Stripe
WhatsApp Image 2023-11-16 at 4.30.00 PM

CAP THEOREM – Decoding the Complexity: Unveiling the Intricacies of CAP Theorem in Distributed Systems

Consistency:

  1. Eventual Consistency.
  2. Strong Consistency.

Eventual Consistency: As the name suggests, eventual consistency means that changes to the value of a data item will eventually propagate to all replicas, but there is a lag, and during this lag, the replicas might return stale data. A scenario where changes in Database 1 take a minute to replicate to Databases 2 and 3 is an example of eventual consistency. 

Suppose you have a blog post counter. If you increment the counter in Database 1, Databases 2 and 3 might still show the old count until they sync up after that 1-minute lag. (RYW – Consistency) (Read your write consistency) RYW (Read-Your-Writes) consistency is achieved when the system guarantees that any attempt to read a record after it has been updated will return the updated value. RDBMS typically provides read-write consistency. When we read immediately, we get old value as there is delayed sync.

Strong Consistency: In strong consistency, all replicas agree on the value of a data item before any of them responds to a read or a write. If a write operation occurs, it’s not considered successful until the update has been received by all replicas. For example, consider a banking transaction. If you withdraw money from an ATM (Database 1), that new balance is immediately propagated to Databases 2 and 3 before the transaction is considered complete. This ensures that any subsequent transactions, perhaps from another ATM (representing Databases 2 or 3), will have the correct balance and you won’t be able to withdraw more money than you have. Even when we read immediately, we get new value as there is Immediate sync.

Functional Requirements vs Non-Functional Requirements:

Functional Requirements are the basic things a system must do. They describe the tasks or processes the system needs to perform. For example, an e-commerce site must be able to process payments and track orders.

Non-Functional Requirements are qualities a system must have. They describe characteristics or attributes of the system. For example, the e-commerce site must be secure (to protect user data), fast (for good user experience), Availability (system shouldn’t be down for very long) and scalable (to support growth in users and orders).

Availability

Availability in terms of information technology refers to the ability of a system or a service to be operational and accessible when users need it. It’s usually expressed as a percentage of the total system downtime over a predefined period.

Let’s illustrate it with an example:

Consider an e-commerce website like Amazon. Availability refers to the system being operational and accessible for users to browse products, add items to the cart, and make purchases. If Amazon’s website is down and users can’t access it to shop, then the website is experiencing downtime and its availability is affected.

In the world of distributed systems, we often aim for high availability. The term “Five Nines” (99.999%) availability is often mentioned as the gold standard, meaning the service is guaranteed to be operational 99.999% of the time, which translates to about 5.26 minutes of downtime per year.

SLA stands for Service Level Agreement. It’s a contract or agreement between a service provider and a customer that specifies, usually in measurable terms, what services the provider will furnish.

AvailabilityDowntime per year
90% (one nine)More than 36 days
95%About 18 days
98%About 7 days
99% (two nines)About 3.65 days
99.9% (three nines)About 8.76 hours
99.99% (four nines)About 52.6 minutes
99.999% (five nines)About 5.26 minutes
99.9999% (six nines)About 31.5 seconds
99.99999% (seven nines)About 3.15 seconds

To increase the availability of the system:

StrategyExplanationExample
ReplicationCreating duplicate instances of data or servicesKeeping multiple copies of a database, so if one crashes, others can handle requests
RedundancyHaving backup components that can take over if the primary one failsUsing multiple servers to host a website, so if one server goes down, others can continue serving
ScalingAdding more resources to a system to handle increased loadAdding more servers during peak traffic times to maintain system performance
Geographical Distribution (CDN)Distributing resources in different physical locationsUsing a Content Delivery Network (CDN) to serve web content to users from the closest server
Load-BalancingDistributing workload across multiple systems to prevent any single system from getting overwhelmedUsing a load balancer to distribute incoming network traffic across several servers
Failover MechanismsAutomatically switching to a redundant system upon the failure of a primary systemIf the primary server fails, an automatic failover process redirects traffic to backup servers
MonitoringKeeping track of system performance and operationUsing monitoring software to identify when system performance degrades, or a component fails
Cloud ServicesUsing cloud resources that can be scaled as neededUsing cloud-based storage that can be increased or decreased based on demand
Scheduled MaintenancesPerforming regular system maintenance during off-peak timesScheduling system updates and maintenance during times when user traffic is typically low
Testing & SimulationRegularly testing system performance and failover proceduresConducting stress tests to simulate high load conditions and ensure the system can handle it

CAP THEOREM

The CAP theorem is a fundamental principle that specifies that it’s impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:

Consistency (C): Every read from the system receives the latest write or an error.

Availability (A): Every request to the system receives a non-error response, without guarantee that it contains the most recent write.

Partition Tolerance (P): The system continues to operate despite an arbitrary number of network failures.

Let’s illustrate this with an example:

Think of a popular social media platform where users post updates (like Twitter). This platform uses a distributed system to store all the tweets. The system is designed in such a way that it spreads its data across many servers for better performance, scalability, and resilience.

Consistency: When a user posts a new tweet, the tweet becomes instantly available to everyone. When this happens, it means the system has a high level of consistency.

Availability: Every time a user tries to fetch a tweet, the system guarantees to return a tweet (although it might not be the most recent one). This is a high level of availability.

Partition Tolerance: If a network problem happens and servers can’t communicate with each other, the system continues to operate and serve tweets. It might show outdated tweets, but it’s still operational.

According to the CAP theorem, only two of these guarantees can be met at any given time. So, if the network fails (Partition), the system must choose between Consistency and Availability. It might stop showing new tweets until the network problem is resolved (Consistency over Availability), or it might show outdated tweets (Availability over Consistency). It can’t guarantee to show new tweets (Consistency) and never fail to deliver a tweet (Availability) at the same time when there is a network problem.

CA in a distributed system:

Correct, in a single-node system (a system that is not distributed), we can indeed have Consistency and Availability (CA) since the issue of network partitions doesn’t arise. Every read receives the latest write (Consistency), and every request receives a non-error response (Availability). There’s no need for Partition Tolerance since there are no network partitions within a single-node system.

However, once you move to a distributed system where data is spread across multiple nodes (computers, servers, regions), you need to handle the possibility of network partitions. Network partitions are inevitable in a distributed system due to various reasons such as network failures, hardware failures, etc. The CAP theorem stipulates that during a network partition, you can only have either Consistency or Availability.

That is why it’s said you can’t achieve CA in a distributed system. You have to choose between Consistency and Availability when a Partition happens. This choice will largely depend on the nature and requirements of your specific application. For example, a banking system might prefer Consistency over Availability, while a social media platform might prefer Availability over Consistency.

Stateful Systems vs Stateless systems:

 Stateful SystemsStateless Systems
DefinitionSystems that maintain or remember state of the interactions.Systems that don’t maintain any state information from previous interactions.
ExampleE-commerce website remembering items in your shopping cart.HTTP protocol treating each request independently.
WhatsApp Image 2023-10-27 at 9.37.27 AM

Unlocking Real-time Communication: Exploring Server-to-Client Data Exchange


There are different ways like:
WebSocket, Server-Sent Events (SSE), Polling, WebRTC, Push Notifications, MQTT, Socket.IO

Polling

In the context of client-server communication, polling is like continually asking “do you have
any updates?” from the client side. For example, imagine you’re waiting for a friend to finish
a task. You keep asking “Are you done yet?” – that’s like polling.

Short Polling:

In short polling, the client sends a request to the server asking if there’s any new information.
The server immediately responds with the data if it’s available or says “no data” if it’s not.
The client waits for a short period before sending another request. It’s like asking your friend
“Are you done yet?” every 6 minutes.

Advantages:

  1. Simple to Implement: Short polling is simple and requires little work to set up. It doesn’t
    require any special type of server-side technology.
  2. Instantaneous Error Detection: If the server is down, the client will know almost
    immediately when it tries to poll.

Disadvantages:

  1. High Network Overhead: Short polling can cause a lot of network traffic as the client
    keeps polling the server at regular intervals.
  2. Wasted Resources: Many of the requests might return empty responses (especially if
    data updates are infrequent), wasting computational and network resources.
  3. Not Real-Time: There is a delay between when the new data arrives at the server and
    when the client receives it. This delay could be up to the polling interval.

Long Polling:

In long polling, the client asks the server if there’s any new information, but this time the
server does not immediately respond with “no data”. Instead, it waits until it has some data
or until a timeout occurs. Once the client receives a response, it immediately sends another
request. In our friend example, it’s like asking “Let me know when you’re done” and waiting
until your friend signals they’ve finished before asking again.

Advantages:

  1. Reduced Network Overhead: Compared to short polling, long polling reduces network traffic
    as it waits for an update before responding.
  2. Near Real-Time Updates: The client receives updates almost instantly after they arrive on the
    server, because the server holds the request until it has new data to send.

Disadvantages:

  1. Complexity: Long polling is more complex to implement than short polling, requiring better
    handling of timeouts and more server resources to keep connections open.
  2. Resource Intensive: Keeping connections open can be resource-intensive for the server if
    there are many clients.
  3. Delayed Error Detection: If the server is down, the client might not know until a timeout occurs.

WebSocket:

WebSocket is a communication protocol that provides full-duplex communication between a
client and a server over a long-lived connection. It’s commonly used in applications that require
real-time data exchange, such as chat applications, real-time gaming, and live updates.

How WebSocket work:

  1. Opening Handshake: The process begins with the client sending a standard HTTP request to
    the server, with an “Upgrade: WebSocket™ header. This header indicates that the client wishes to
    establish a WebSocket connection.
  2. Server Response: If the server supports the WebSocket protocol, it agrees to the upgrade and
    responds with an “HTTP/1.1101 Switching Protocols” status code, along with an

“Upgrade: WebSocket™ header. This completes the opening handshake, and the initial HTTP
connection is upgraded to a WebSocket connection.

  1. Data Transfer: Once the connection is established, data can be sent back and forth between
    the client and the server. This is different from the typical HTTP request/response paradigm;
    with WebSocket, both the client and the server can send data at any time. The data is sent in
    the form of WebSocket frames.
  2. Pings and Pongs: The WebSocket protocol includes built-in “ping” and “pong” messages for
    keeping the connection alive. The server can periodically send a “ping” to the client, who should
    respond with a “pong”. This helps to ensure that the connection is still active, and that the client is still responsive.
  3. Closing the Connection: Either the client or the server can choose to close the WebSocket
    connection at any time. This is done by sending a “close” frame, which can include a status code
    and a reason for closing. The other party can then respond with its own “close” frame, at which point the connection is officially closed.
  4. Error Handling: If an error occurs at any point, such as a network failure or a protocol
    violation, the WebSocket connection is closed immediately.

Key Differences (Long-Polling vs WebSocket):

  1. Bidirectional vs Unidirectional:

WebSocket’s provide a bidirectional communication channel between client and server,
meaning data can be sent in both directions independently.

Long polling is essentially unidirectional, with the client initiating all requests.

  1. Persistent Connection:

WebSocket’s establish a persistent connection between client and server that stays open
for as long as needed.

In contrast, long polling uses a series of requests and responses, which are essentially
separate HTTP connections.

  1. Efficiency:

WebSocket’s are generally more efficient for real-time updates, especially when updates
are frequent, because they avoid the overhead of establishing a new HTTP connection for
each update.

Long polling can be less efficient because it involves more network overhead and can tie
up server resources keeping connections open.

  1. Complexity:
    WebSocket’s can be more complex to set up and may require specific server-side technology.
    Long polling is easier to implement and uses traditional HTTP connections.

Server-Sent Events (SSE):

Server-Sent Events (SSE) is a standard that allows a web server to push updates to the client
whenever new information is available. This is particularly useful for applications that require
real-time data updates, such as live news updates, sports scores, or stock prices.

Here’s a detailed explanation of how SSE works:

  1. Client Request: The client (usually a web browser) makes an HTTP request to the server,
    asking to subscribe to an event stream. This is done by setting the “Accept” header
    to “text/event-stream”.
  2. Server Response: The server responds with an HTTP status code of 200 and
    a “Content-Type” header set to “text/event-stream”, From this point on, the server can
    send events to the client at any time.
  3. Data Transfer: The server sends updates in the form of events. Each event is a block of
    text that is sent over the connection. An event can include an “id”, an “event” type, and “data”.
    The “data” field contains the actual message content.
  4. Event Handling: On the client side, an EventSource JavaScript object is used to handle
    incoming events. The EventSource object has several event handlers that can be used to handle
    different types of events, including “onopen’, “onmessage’, and “onerror”.
  5. Reconnection: If the connection is lost, the client will automatically try to reconnect to the
    server after a few seconds. The server can also suggest a reconnection time by including a “retry”
    field in the response.
  6. Closing the Connection: Either the client or the server can choose to close the connection at any time. The client can close the connection by calling the EventSource object’s “close” method. The server can close the connection by simply not sending any more events.