WhatsApp Image 2024-09-13 at 14.43.06

Understanding Captial Markets: Your Gateway to Investment Opportunities

What are Capital markets?

Capital markets are a segment of financial markets where individuals, companies, and governments can secure long-term funding by selling financial instruments to investors. 

Trade Cycle:

▪ Trade Initiation: Trade initiation is the process of placing an order in the market.

▪ Oder Placement (Contact Note will get EOD): Request for Sell/Buy of Secuity.

▪ Trade Execution: Trade execution is when a buy or sell order gets fulfilled.

▪ Trade Capture: Sends details to clearing house 

▪ Trade Enrichment: Details info of trade

▪ Trade Validation, Confirmation (Net obligation) 

▪ Clearing (Confirmation & Updating in custodian records) Fund pay, Sec pay

▪ Settlement (Owner ship of sec will get transfer, Shares/Funds will get will settle (Dr, Cr in the demat account) for both parties.

1995-T+3, 2017-T+2, 2024-T+1

Trade Execution: The settlement process starts once a trade has been executed, indicating that a buyer and seller have agreed on the transaction’s terms, including price, quantity, and other relevant details. Trades can be executed through various channels, such as traditional stock exchanges, electronic trading platforms, or over the counter (OTC) markets.

Trade Confirmation: Once a trade is executed, both the buyer and seller receive trade confirmations from their brokers or financial institutions. These confirmations include details of the trade, such as the trade date, settlement date, price, quantity, and any associated fees.

Clearance: Clearance involves verifying trade details and confirming that both the buyer and seller have the necessary resources to complete the transaction. This process often includes a clearinghouse, which serves as an intermediary to facilitate the trade. The clearinghouse ensures that the trade is valid and that both parties can meet their obligations.

Settlement Date: The settlement date is when the actual transfer of shares and funds occurs. In many markets, this typically happens two business days after the trade date, referred to as T+1, T+2, or T+3 (trade date plus two days). However, settlement periods can vary depending on the market and region.

In Indian Market

In a bold initiative to modernize and strengthen India’s financial markets, the Securities and Exchange Board of India (SEBI) has implemented the T+0 settlement cycle. This groundbreaking system allows for the settlement of trades on the same day the transaction occurs, moving away from the conventional T+2 settlement cycle. The T+0 settlement cycle reduces transactional risks and improves market efficiency by offering immediate liquidity to investors.

In US Market

The United States transitioned to a T+1 settlement cycle on May 28, 2024. Previously, most brokered-dealer transactions settled on a T+2 basis, but now they settle within one business day of the transaction date. Under the new T+1 cycle, securities transactions involving U.S. financial institutions settle more quickly, affecting stocks, bonds, municipal securities, exchange-traded funds, and certain mutual funds.

Delivery of Shares: 

On the settlement date, the seller’s brokerage or custodian transfers the shares from the seller’s account to the buyer’s account. This transfer is usually conducted electronically via a central securities depository (CSD) or a similar institution. At this point, the shares are legally owned by the buyer.

Payment: 

At the same time as the shares are delivered, the buyer’s brokerage transfers the agreed-upon payment to the seller’s brokerage or custodian. This payment can be made in cash or through an electronic funds transfer.

Confirmation: 

Once the shares have been successfully delivered and payment has been received, the trade is deemed settled. Both the buyer and the seller receive final confirmation statements from their brokers or financial institutions, confirming that the settlement has been completed.

WhatsApp Image 2024-09-12 at 13.11.02

Mastering Scrollable Cursors in DB2: Efficient Data Navigation and Retrieval

Scrollable cursors in DB2:

Processing large volumes of data on your mainframe can be challenging. In this blog post, we introduce a valuable DB2 feature called scrollable cursors that simplifies managing extensive datasets. We’ll cover what scrollable cursors are, why they’re beneficial, and how you can start using them to improve your data handling.

What Are Scrollable Cursors?

Scrollable cursors in DB2 offer more flexibility when working with data. Unlike regular cursors that only move forward through the result set, scrollable cursors let you move forward, backward, or jump to a specific point in the data. This makes it easier to navigate and work with the result table.

Benefits of Scrollable Cursors:

  1. Move both forward and backward within the result set.
  2. Skip directly to specific rows without retrieving all previous rows.
  3. Retrieve only the needed rows, minimizing unnecessary data processing.

Types of scrollable cursors:

  1. Sensitive Scrollable Cursors:

Sensitive scrollable cursors are designed to reflect any changes made to the database while the cursor is open. This means that if other transactions update, insert, or delete data in the result set, these changes will be visible as you scroll through the cursor.

There are two types of sensitive scrollable cursors:

  • Sensitive Static Scrollable Cursor: This cursor type shows updates, inserts, or deletes that occur after the cursor is opened, but only for those changes that were made before the cursor’s position is moved.
  • Sensitive Dynamic Scrollable Cursor: This cursor type displays real-time updates, inserts, or deletes as they happen, ensuring that the data you see is always current as you navigate through the result set.

Sample Declaration:

EXEC SQL

    DECLARE EMP_CUR SENSITIVE STATIC SCROLL CURSOR FOR

       SELECT EMPNO, ENAME, DEPT_ID

              FROM DBTST.EMP_TBL

       ORDER BY EMPNO

END-EXEC.

EXEC SQL

    DECLARE EMP_CUR SENSITIVE DYNAMIC SCROLL CURSOR FOR

      SELECT EMPNO, ENAME, DEPT_ID

             FROM DBTST.EMP_TBL

      ORDER BY EMPNO

END-EXEC.

  1. Insensitive Scrollable Cursors: 

Insensitive scrollable cursors do not show changes made to the database while they are open. Any updates, inserts, or deletes performed by other transactions will not be visible as you scroll through the cursor. The result set remains static and unchanged throughout the cursor’s use.

Sample Declaration:

EXEC SQL

    DECLARE EMP_CUR INSENSITIVE SCROLL CURSOR FOR

       SELECT EMPNO, ENAME, DEPT_ID

              FROM DBTST.EMP_TBL

              ORDER BY EMPNO

END-EXEC.

Note:The default scrollable cursor in Db2 is the “INSENSITIVE” scrollable cursor. This means that once the result set is retrieved, it does not reflect any changes made to the underlying data (such as updates, inserts, or deletes) after the cursor is opened. The result set remains static and unchanging during the cursor’s lifetime.

Fetching the rows using Scrollable cursor:

Scrollable cursors let you move around the result table and start fetching data from any point. Unlike normal cursors, which only read rows one by one from the start, scrollable cursors allow you to jump to different positions using the Fetch Orientation keyword.

Here are the different Fetch Orientation Keywords:

  • BEFORE – Positions the cursor before the first row of the result table.
  • AFTER – Positions the cursor after the last row of the result table.
  • FIRST – Positions the cursor on the first row.
  • LAST – Positions the cursor on the last row.
  • NEXT – Positions the cursor on the next row. This is the default.
  • CURRENT – Fetches the current row from the result table.
  • PRIOR – Fetches the previous row.
  • ABSOLUTE n – Positions the cursor on the nth row. ‘n’ can be positive, negative, or zero, and it calculates the nth row from the start of the result table.

Example code using Scrollable cursors:

— Declare the scrollable cursor

DECLARE my_cursor SCROLL CURSOR FOR

SELECT emp_id, emp_name, salary

FROM employees

ORDER BY emp_id;

— Open the cursor

OPEN my_cursor;

— Fetch the first row

FETCH FIRST FROM my_cursor INTO :emp_id, :emp_name, :salary;

— Fetch the next row

FETCH NEXT FROM my_cursor INTO :emp_id, :emp_name, :salary;

— Fetch the previous row

FETCH PRIOR FROM my_cursor INTO :emp_id, :emp_name, :salary;

— Position the cursor before the first row of the result table 

FETCH BEFORE FROM my_cursor INTO :emp_id, :emp_name, :salary;

— Position the cursor before the first row of the result table

FETCH AFTER FROM my_cursor INTO :emp_id, :emp_name, :salary;

— Position the cursor 3 rows before the current cursor position

FETCH RELATIVE -3  FROM my_cursor INTO :emp_id, :emp_name, :salary;

— Position the cursor 4 rows after the current cursor position

FETCH RELATIVE +4 FROM my_cursor INTO :emp_id, :emp_name, :salary;

— Fetch an absolute row

FETCH ABSOLUTE 3 FROM my_cursor INTO :emp_id, :emp_name, :salary;

— Close the cursor

CLOSE my_cursor;

Sample COBOL Program:

Here is a COBOL program that demonstrates how to use the scrollable cursor to fetch the last 10 rows:

       IDENTIFICATION DIVISION.

       PROGRAM-ID. FetchLast10Rows.

       DATA DIVISION.

       WORKING-STORAGE SECTION.

       01 emp_id           PIC 9(5).

       01 emp_name         PIC X(20).

       01 salary           PIC 9(7)V99.

       01 sqlcode          PIC S9(9) COMP.

       01 ws-row-count     PIC 9(5) VALUE 0.

       01 ws-max-rows      PIC 9(5) VALUE 10.       

       EXEC SQL INCLUDE SQLCA END-EXEC.

       PROCEDURE DIVISION.

       MAIN-PARA.

           EXEC SQL

               OPEN my_cursor

           END-EXEC.

           IF SQLCODE = 0 THEN

               PERFORM FETCH-LAST-10-ROWS

           ELSE

               DISPLAY ‘Error opening cursor: ‘ SQLCODE

           END-IF.

           EXEC SQL

               CLOSE my_cursor

           END-EXEC.

           STOP RUN.

       FETCH-LAST-10-ROWS.

           EXEC SQL

               FETCH LAST FROM my_cursor

               INTO :emp_id, :emp_name, :salary

           END-EXEC.

           IF SQLCODE = 0 THEN

               DISPLAY ‘Row: ‘ emp_id ‘, ‘ emp_name ‘, ‘ salary

               PERFORM VARYING ws-row-count FROM 1 BY 1 UNTIL ws-row-count >= ws-max-rows

                   EXEC SQL

                       FETCH PRIOR FROM my_cursor

                       INTO :emp_id, :emp_name, :salary

                   END-EXEC

                   IF SQLCODE = 0 THEN

                       DISPLAY ‘Row: ‘ emp_id ‘, ‘ emp_name ‘, ‘ salary

                   ELSE

                       EXIT PERFORM

                   END-IF

               END-PERFORM

           ELSE

               DISPLAY ‘Error fetching last row: ‘ SQLCODE

           END-IF.

Advantages and Disadvantages of Scrollable Cursors in DB2:

Advantages:

  1. Move forward and backward through data easily.
  2. Access only the rows you need, saving time.
  3. Ideal for interactive applications needing dynamic data access.

Disadvantages:

  1. Adds complexity to the code and requires careful error handling.
  2. Best for specific scenarios, not always needed for simple tasks.

Conclusion:

Scrollable cursors in DB2 are excellent for navigating large datasets more efficiently. They enhance data retrieval and boost your DB2 application’s performance. Experiment with different fetch orientations and use scrollable cursors when needed to make the most of this feature.

WhatsApp Image 2024-07-12 at 15.13.21 (1)

Mastering Spring Transaction Management: Best Practices and Strategies


What is transaction management, for example let’s say a money transaction is going to take place between two parties now, in order to have a successful transaction two instructions need to take place:

  1. The source balance must be decreased. 
  2. The receiver balance must be increased.

If any one of the instructions failed, the other also needed to be stopped or else it will be a faulty transaction. In our day-to-day transactions this phenomenon occurs.

So, in simple terms, what the transaction management does is, it executes all the transactions or none of them will get executed. Transactions are atomic and ensure the database’s data remains consistent despite failure or errors.

ACID PROPERTIES OF A TRANSACTION:

Atomicity: Atomicity, a fundamental concept in database transactions, ensures that a sequence of operations is treated as a single unit. Either the entire transaction is successfully completed, or if any part fails, the entire transaction is reverted to its previous state, preserving database consistency. For instance, in a financial transaction involving the transfer of funds between accounts, both operations must be executed within a single unit of work to guarantee data integrity.

Consistency:Consistency is a fundamental property that ensures all data written to the database adheres to the established rules and constraints. It guarantees the validity and accuracy of data, reflecting real-world relationships between entities. A transaction serves as a mechanism to transition the database from one consistent state to another.

Isolation: Isolation is a crucial database property that ensures concurrent transactions do not interfere. It guarantees that each transaction is executed independently, as if it were the sole operation occurring, even in the presence of multiple simultaneous transactions. This mechanism is essential to prevent inconsistencies and maintain data integrity. In Relational Database Management Systems (RDBMS), four primary isolation levels are commonly used. These levels, typically arranged in ascending order of restrictiveness, are Read Uncommitted, Read Committed, Repeatable Read, and Serializable. Among these, Read Committed stands as the default isolation level, proving adequate for most use cases. However, the selection of an appropriate isolation level should be carefully considered based on the specific requirements of the application. It is important to note that performance considerations are inversely proportional to the level of isolation implemented. Higher isolation levels may introduce additional overhead and potentially impact system performance. Therefore, striking a balance between isolation and performance is essential to optimize database operations and ensure both data integrity and efficient system functioning.

Durability:Durability is the characteristic that guarantees that once a transaction has been committed (i.e., completed and saved), it will not be lost even in the event of a system failure. This is typically achieved through logs and backups, which enable the database to be restored to a known, functional state in case of a failure. 

In summary, the ACID properties are fundamental to preserving the integrity and reliability of a database. They ensure that transactions are atomic, consistent, isolated, and durable, which are essential requirements for any DBMS.

The actual working of the transaction management would be like once the transaction begins the queries execute one by one, but it won’t reflect over the database yet. Once all the queries are completed ‘commit’ command will get executed then all these queries will get executed and data will be reflected onto the database. The start of the transactions is ‘begin’, then the transactions will take place and finally ‘commit’ to make changes in the database. If there is no transaction management the default mode is ‘auto commit on’.

Begin
Query1
Query2
Query3
Commit

The @Transactional annotation is used to mark a method or a class as transactional, meaning that any database operations performed within the marked method or class will be executed within a transaction. If the transaction is successful, the changes will be committed to the database. If an error occurs and the transaction is rolled back, the changes will not be persisted in the database. The default propagation is ‘REQUIRED’.

Propagations:

The default propagation is REQUIRED, it will be applied if not provided explicitly.

There are a total of 6 propagations:

PropagationBehavior
REQUIREDAlways executes in a transaction. If there is any existing transaction it uses it. If none exists, then only a new one is created
SUPPORTSIt may or may not run in a transaction. If a current transaction exists, it will take place in that transaction or else it will run without the transaction.
NOT_SUPPORTEDAlways executes without a transaction. As the name says it will not use any existing transaction and it will not create any new transaction. Even if there is any previous transaction it will suspend that before executing this and resume once done
REQUIRES_NEWAlways executes in a new transaction. Irrespective of the previous transaction, i.e. even if it’s there or not a new transaction will be created.
NEVERAlways executes without any transaction. It’s more like NOT_SUPPORTED but it also throws an exception if there is any existing transaction.
MANDATORYAlways executes in a transaction. It always uses an existing transaction. If there isn’t any existing transaction it will throw an exception.

ISOLATION LEVELS:

1. DEFAULT

  • Uses the default isolation level of the underlying database.

2. READ_UNCOMMITTED

  • Allows dirty reads, meaning that changes made by one transaction can be read by other transactions before being committed.
  • This level provides the highest performance but the lowest data consistency.

3. READ_COMMITTED

  • Prevents dirty reads by allowing other transactions to read only committed changes.
  • This level provides a balance between performance and consistency.

4. REPEATABLE_READ

  • Ensures that if a value is read multiple times within the same transaction, the result will always be the same, if it hasn’t been changed by the same transaction.
  • This level provides higher consistency but may impact performance.

5. SERIALIZABLE

  • Ensures complete isolation from other transactions, meaning that transactions are executed in a way that produces the same outcome as if they were executed serially.
  • This level provides the highest consistency but may have a significant impact on performance.
WhatsApp Image 2024-07-12 at 15.13.21

Key Strategies for Building a Robust Backend API Solution

Creating a robust backend API solution requires a strategic approach. Here are some key strategies to consider when planning an API/Backend Service:

  1. Input Validation: 
    1. Malformed Syntax: If the request violates protocol standards or contains typos or errors, it’s considered malformed. For example, missing required headers, parameters, or an improperly formatted request body could trigger a 400 error.
    2. Missing Required Parameters: When a specific parameter is essential for processing a request, and it’s absent, returning a 400 error is appropriate. For instance, if your web form requires an EstateId parameter, you can throw a 400 error if it’s not present.
    3. Invalid Parameter Values: If the client sends parameter values that don’t adhere to the expected format or constraints, a 400 error is suitable. For example, passing an invalid estateId value could trigger this error.

Remember that a 400 error indicates a client-side issue, so consider providing a clear error message to help users understand what went wrong. Whether you choose to display an error message or redirect to a search page depends on your application’s context and user experience goals.

  1. Input Authentication:

A 401 Unauthorized status code in a REST API indicates that the client tried to operate on a protected resource without providing the proper authorization1. Here are some scenarios when you might return a 401 error:

  1. Missing or Invalid Authentication: When the client doesn’t provide valid credentials (e.g., missing access token or expired token), returning a 401 is appropriate2.
  2. Incorrect Credentials: If the provided credentials (such as API keys, tokens, or basic authentication) are incorrect, a 401 response is suitable2.
  3. Expired or Revoked Tokens: When an access token has expired or been revoked, returning a 401 communicates that the client needs to re-authenticate3.

Remember to include a WWW-Authenticate header field in the response, which provides a challenge applicable to the requested resource

  1. Input Access Authorization:

A 403 Forbidden status code in a REST API indicates that the client is authenticated but lacks permission to access the requested resource1. Here are some scenarios when you might use a 403 error:

  1. Insufficient Permissions: When a user is logged in but doesn’t have the necessary permissions for a specific action, such as modifying a document or accessing a restricted endpoint, a 403 response is appropriate2.
  2. Rate Limiting: If a user exceeds their allowed rate limit (e.g., too many requests within a specific time window), returning a 403 error helps enforce rate limits3.
  3. Payment Required: In some cases, you might want to prompt the user to pay or upgrade their account. Alongside a 406 error (Not Acceptable), a 403 response can indicate insufficient account limits3.

Remember to provide clear error messages in the response body to help users understand the reason for the forbidden access

  1. Input Resource Integrity:

When designing a RESTful API, returning a 404 Not Found status code is appropriate in specific scenarios. Let’s explore when to use it:

  1. Resource Not Found: If a client requests a specific resource (e.g., /foos/5), and that resource doesn’t exist (e.g., no foo with ID 5), returning a 404 is expected. It indicates that the requested resource is missing.
  2. Malformed URI: When the URL path is incorrect (e.g., /food/1 instead of /foos/1), a 404 response is suitable. It signifies either a badly constructed URI or a reference to a non-existent resource.
  3. Multiple Resources Referenced: If a URI references multiple resources (e.g., /products/5), a simple 404 response doesn’t specify which resource was not found. To mitigate this, consider providing additional information in the response body to clarify what wasn’t found.

Remember that 404 indicates either a missing resource or a structural issue in the URI. Providing clear error messages helps API users understand the problem and improve their experience

  1. System Failures:

The 500 Internal Server Error is a HTTP status code that indicates an unexpected condition was encountered by the server, which prevented it from fulfilling the request. Here are some key points about when to throw a 500 Internal Server Error from an API:

Server-side issues: The 500 Internal Server Error is typically used to indicate that an unexpected condition was encountered by the server, and it prevented the request from being fulfilled. This error falls within the range of 5xx error codes, which signify issues on the server side.

Unpredictable errors: If the error is caused by anything other than the inputs explicitly or implicitly supplied by the request, then a 500 error is likely appropriate. So, a failed database connection or other unpredictable error is accurately represented by a 500 series error.

Unknown error conditions: This response is often used when the exact error condition is unknown or does not fit any other error condition1.

Not client-side errors: It is a server error, not a client error. If server errors weren’t to be returned to the client, there wouldn’t be created an entire status code class for them (i.e., 5xx).

Remember, when users encounter a 500 error, there is little they can do to resolve the issue themselves1. Therefore, it’s important to handle these errors properly on the server side and provide meaningful error messages whenever possible. If you want to differentiate between a handled and unhandled server exception, you may do that using logging (at the server side) or in the request body.

WhatsApp Image 2024-06-20 at 17.33.39

Mastering Solid Principles: Enhancing Object-Oriented Design in Software Development

SOLID principles are object-oriented design concepts relevant to software development.

SOLID is an acronym for five other class-design principles:

  •  Single Responsibility Principle,
  • Open-Closed Principle,
  • Liskov Substitution Principle,
  • Interface Segregation Principle,
  • Dependency Inversion Principle.
  1. Single-Responsibility Principle (SRP):

The single-responsibility principle states that:

A class should have only one reason to change.

This means that a class should have only one responsibility, as expressed through its methods. If a class takes care of more than one task, then you should separate those tasks into separate classes.

This principle is closely related to the concept of separation of concerns, which suggests that you should split your programs into different sections. Each section must address a separate concern.

To illustrate the single-responsibility principle and how it can help you improve your object-oriented design, say that you have the following FileManager class:

Python

# file_manager_srp.py

from pathlib import Path
from zipfile import ZipFile

class FileManager:
    def __init__(self, filename):
        self.path = Path(filename)

    def read(self, encoding=”utf-8″):
        return self.path.read_text(encoding)

    def write(self, data, encoding=”utf-8″):
        self.path.write_text(data, encoding)

    def compress(self):
        with ZipFile(self.path.with_suffix(“.zip”), mode=”w”) as archive:
            archive.write(self.path)

    def decompress(self):
        with ZipFile(self.path.with_suffix(“.zip”), mode=”r”) as archive:
            archive.extractall()

In this example, your FileManager class has two different responsibilities. It uses the .read() and .write() methods to manage the file. It also deals with ZIP archives by providing the .compress() and .decompress() methods.

This class violates the single-responsibility principle because it has two reasons for changing its internal implementation. To fix this issue and make your design more robust, you can split the class into two smaller, more focused classes, each with its own specific concern:

Python

# file_manager_srp.py

from pathlib import Path
from zipfile import ZipFile

class FileManager:
    def __init__(self, filename):
        self.path = Path(filename)

    def read(self, encoding=”utf-8″):
        return self.path.read_text(encoding)

    def write(self, data, encoding=”utf-8″):
        self.path.write_text(data, encoding)

class ZipFileManager:
    def __init__(self, filename):
        self.path = Path(filename)

    def compress(self):
        with ZipFile(self.path.with_suffix(“.zip”), mode=”w”) as archive:
            archive.write(self.path)

    def decompress(self):
        with ZipFile(self.path.with_suffix(“.zip”), mode=”r”) as archive:
            archive.extractall()

Now you have two smaller classes, each having only a single responsibility. FileManager takes care of managing a file, while ZipFileManager handles the compression and decompression of a file using the ZIP format. These two classes are smaller, so they’re more manageable. They’re also easier to reason about, test, and debug.

The concept of responsibility in this context may be subjective. Having a single responsibility doesn’t necessarily mean having a single method. Responsibility isn’t directly tied to the number of methods but to the core task that your class is responsible for, depending on your idea of what the class represents in your code. However, that subjectivity shouldn’t stop you from striving to use the SRP.

  • Open-Closed Principle (OCP) :

The open-closed principle (OCP) for object-oriented design was originally introduced by Bertrand Meyer in 1988 and means that:

Software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification.

To understand what the open-closed principle is all about, consider the following Shape class:

Python

# shapes_ocp.py

from math import pi

class Shape:
    def __init__(self, shape_type, **kwargs):
        self.shape_type = shape_type
        if self.shape_type == “rectangle”:
            self.width = kwargs[“width”]
            self.height = kwargs[“height”]
        elif self.shape_type == “circle”:
            self.radius = kwargs[“radius”]

    def calculate_area(self):
        if self.shape_type == “rectangle”:
            return self.width * self.height
        elif self.shape_type == “circle”:
            return pi * self.radius**2

The initializer of Shape takes a shape_type argument that can be either “rectangle” or “circle”. It also takes a specific set of keyword arguments using the **kwargs syntax. If you set the shape type to “rectangle”, then you should also pass the width and height keyword arguments so that you can construct a proper rectangle.

In contrast, if you set the shape type to “circle”, then you must also pass a radius argument to construct a circle.

Shape also has a .calculate_area() method that computes the area of the current shape according to its .shape_type:

Python

>>> from shapes_ocp import Shape

>>> rectangle = Shape(“rectangle”, width=10, height=5)
>>> rectangle.calculate_area()
50
>>> circle = Shape(“circle”, radius=5)
>>> circle.calculate_area()
78.53981633974483

The class works. You can create circles and rectangles, compute their area, and so on. However, the class looks bad. Something seems wrong with it at first sight.

Imagine that you need to add a new shape, maybe a square. How would you do that? Well, the option here is to add another elif clause to .__init__() and to .calculate_area() so that you can address the requirements of a square shape.

Having to make these changes to create new shapes means that your class is open to modification. That violates the open-closed principle. How can you fix your class to make it open to extension but closed to modification? Here is a solution:

Python

# shapes_ocp.py

from abc import ABC, abstractmethod
from math import pi

class Shape(ABC):
    def __init__(self, shape_type):
        self.shape_type = shape_type

    @abstractmethod
    def calculate_area(self):
        pass

class Circle(Shape):
    def __init__(self, radius):
        super().__init__(“circle”)
        self.radius = radius

    def calculate_area(self):
        return pi * self.radius**2

class Rectangle(Shape):
    def __init__(self, width, height):
        super().__init__(“rectangle”)
        self.width = width
        self.height = height

    def calculate_area(self):
        return self.width * self.height

class Square(Shape):
    def __init__(self, side):
        super().__init__(“square”)
        self.side = side

    def calculate_area(self):
        return self.side**2

In this code, you completely refactored the Shape class, turning it into an abstract base class (ABC). This class provides the required interface (API) for any shape that you’d like to define. That interface consists of a .shape_type attribute and a .calculate_area() method that you must override in all the subclasses.

This update closes the class to modifications. Now you can add new shapes to your class design without the need to modify Shape. In every case, you’ll have to implement the required interface, which also makes your classes polymorphic.

  • Liskov Substitution Principle (LSP):

The Liskov substitution principle (LSP) was introduced by Barbara Liskov at an OOPSLA conference in 1987. Since then, this principle has been a fundamental part of object-oriented programming. The principle states that:

Subtypes must be substitutable for their base types.

For example, if you have a piece of code that works with a Shape class, then you should be able to substitute that class with any of its subclasses, such as Circle or Rectangle, without breaking the code.

In practice, this principle is about making your subclasses behave like their base classes without breaking anyone’s expectations when they call the same methods. To continue with shape-related examples, say you have a Rectangle class like the following:

Python

# shapes_lsp.py

class Rectangle:
    def __init__(self, width, height):
        self.width = width
        self.height = height

    def calculate_area(self):
        return self.width * self.height

In Rectangle, you’ve provided the .calculate_area() method, which operates with the .width and .height instance attributes.

Because a square is a special case of a rectangle with equal sides, you think of deriving a Square class from Rectangle in order to reuse the code. Then, you override the setter method for the .width and .height attributes so that when one side changes, the other side also changes:

Python

# shapes_lsp.py

# …

class Square(Rectangle):
    def __init__(self, side):
        super().__init__(side, side)

    def __setattr__(self, key, value):
        super().__setattr__(key, value)
        if key in (“width”, “height”):
            self.__dict__[“width”] = value
            self.__dict__[“height”] = value

In this snippet of code, you’ve defined Square as a subclass of Rectangle. As a user might expect, the class constructor takes only the side of the square as an argument. Internally, the .__init__() method initializes the parent’s attributes, .width and .height, with the side argument.

You’ve also defined a special method, .__setattr__(), to hook into Python’s attribute-setting mechanism and intercept the assignment of a new value to either the .width or .height attribute. Specifically, when you set one of those attributes, the other attribute is also set to the same value:

Python

>>> from shapes_lsp import Square

>>> square = Square(5)
>>> vars(square)
{‘width’: 5, ‘height’: 5}

>>> square.width = 7
>>> vars(square)
{‘width’: 7, ‘height’: 7}

>>> square.height = 9
>>> vars(square)
{‘width’: 9, ‘height’: 9}

Now you’ve ensured that the Square object always remains a valid square, making your life easier for the small price of a bit of wasted memory. Unfortunately, this violates the Liskov substitution principle because you can’t replace instances of Rectangle with their Square counterparts.

When someone expects a rectangle object in their code, they might assume that it’ll behave like one by exposing two independent .width and .height attributes. Meanwhile, your Square class breaks that assumption by changing the behavior promised by the object’s interface. That could have surprising and unwanted consequences, which would likely be hard to debug.

While a square is a specific type of rectangle in mathematics, the classes that represent those shapes shouldn’t be in a parent-child relationship if you want them to comply with the Liskov substitution principle. One way to solve this problem is to create a base class for both Rectangle and Square to extend:

Python

# shapes_lsp.py

from abc import ABC, abstractmethod

class Shape(ABC):
    @abstractmethod
    def calculate_area(self):
        pass

class Rectangle(Shape):
    def __init__(self, width, height):
        self.width = width
        self.height = height

    def calculate_area(self):
        return self.width * self.height

class Square(Shape):
    def __init__(self, side):
        self.side = side

    def calculate_area(self):
        return self.side ** 2

Shape becomes the type that you can substitute through polymorphism with either Rectangle or Square, which are now siblings rather than a parent and a child. Notice that both concrete shape types have distinct sets of attributes, different initializer methods, and could potentially implement even more separate behaviors. The only thing that they have in common is the ability to calculate their area.

With this implementation in place, you can use the Shape type interchangeably with its Square and Rectangle subtypes when you only care about their common behavior:

Python

>>> from shapes_lsp import Rectangle, Square

>>> def get_total_area(shapes):
…    return sum(shape.calculate_area() for shape in shapes)

>>> get_total_area([Rectangle(10, 5), Square(5)])
75

Here, you pass a pair consisting of a rectangle and a square into a function that calculates their total area. Because the function only cares about the .calculate_area() method, it doesn’t matter that the shapes are different. This is the essence of the Liskov substitution principle.

  • Interface Segregation Principle (ISP):

The interface segregation principle (ISP) comes from the same mind as the single-responsibility principle. Yes, it’s another feather in Uncle Bob’s cap. The principle’s main idea is that:

Clients should not be forced to depend upon methods that they do not use. Interfaces belong to clients, not to hierarchies.

In this case, clients are classes and subclasses, and interfaces consist of methods and attributes. In other words, if a class doesn’t use methods or attributes, then those methods and attributes should be segregated into more specific classes.

Consider the following example of class hierarchy to model printing machines:

Python

# printers_isp.py

from abc import ABC, abstractmethod

class Printer(ABC):
    @abstractmethod
    def print(self, document):
        pass

    @abstractmethod
    def fax(self, document):
        pass

    @abstractmethod
    def scan(self, document):
        pass

class OldPrinter(Printer):
    def print(self, document):
        print(f”Printing {document} in black and white…”)

    def fax(self, document):
        raise NotImplementedError(“Fax functionality not supported”)

    def scan(self, document):
        raise NotImplementedError(“Scan functionality not supported”)

class ModernPrinter(Printer):
    def print(self, document):
        print(f”Printing {document} in color…”)

    def fax(self, document):
        print(f”Faxing {document}…”)

    def scan(self, document):
        print(f”Scanning {document}…”)

In this example, the base class, Printer, provides the interface that its subclasses must implement. OldPrinter inherits from Printer and must implement the same interface. However, OldPrinter doesn’t use the .fax() and .scan() methods because this type of printer doesn’t support these functionalities.

This implementation violates the ISP because it forces OldPrinter to expose an interface that the class doesn’t implement or need. To fix this issue, you should separate the interfaces into smaller and more specific classes. Then you can create concrete classes by inheriting from multiple interface classes as needed:

Python

# printers_isp.py

from abc import ABC, abstractmethod

class Printer(ABC):
    @abstractmethod
    def print(self, document):
        pass

class Fax(ABC):
    @abstractmethod
    def fax(self, document):
        pass

class Scanner(ABC):
    @abstractmethod
    def scan(self, document):
        pass

class OldPrinter(Printer):
    def print(self, document):
        print(f”Printing {document} in black and white…”)

class NewPrinter(Printer, Fax, Scanner):
    def print(self, document):
        print(f”Printing {document} in color…”)

    def fax(self, document):
        print(f”Faxing {document}…”)

    def scan(self, document):
        print(f”Scanning {document}…”)

Now Printer, Fax, and Scanner are base classes that provide specific interfaces with a single responsibility each. To create OldPrinter, you only inherit the Printer interface. This way, the class won’t have unused methods. To create the ModernPrinter class, you need to inherit from all the interfaces. In short, you’ve segregated the Printer interface.

This class design allows you to create different machines with different sets of functionalities, making your design more flexible and extensible.

  • Dependency Inversion Principle (DIP):

The dependency inversion principle (DIP) is the last principle in the SOLID set. This principle states that:

Abstractions should not depend upon details. Details should depend upon abstractions.

That sounds complex. Here’s an example that will help to clarify it. Say you’re building an application and have a Frontend class to display data to the users in a friendly way. The app currently gets its data from a database, so you end up with the following code:

Python

# app_dip.py

class FrontEnd:
    def __init__(self, back_end):
        self.back_end = back_end

    def display_data(self):
        data = self.back_end.get_data_from_database()
        print(“Display data:”, data)

class BackEnd:
    def get_data_from_database(self):
        return “Data from the database”

In this example, the Frontend class depends on the Backend class and its concrete implementation. You can say that both classes are tightly coupled. This coupling can lead to scalability issues. For example, say that your app is growing fast, and you want the app to be able to read data from a REST API. How would you do that?

You may think of adding a new method to Backend to retrieve the data from the REST API. However, that will also require you to modify Frontend, which should be closed to modification, according to the open-closed principle.

To fix the issue, you can apply the dependency inversion principle and make your classes depend on abstractions rather than on concrete implementations like BackEnd. In this specific example, you can introduce a DataSource class that provides the interface to use in your concrete classes:

Python

# app_dip.py

from abc import ABC, abstractmethod

class FrontEnd:
    def __init__(self, data_source):
        self.data_source = data_source

    def display_data(self):
        data = self.data_source.get_data()
        print(“Display data:”, data)

class DataSource(ABC):
    @abstractmethod
    def get_data(self):
        pass

class Database(DataSource):
    def get_data(self):
        return “Data from the database”

class API(DataSource):
    def get_data(self):
        return “Data from the API”

In this redesign of your classes, you’ve added a DataSource class as an abstraction that provides the required interface, or the .get_data() method. Note how FrontEnd now depends on the interface provided by DataSource, which is an abstraction.

Then you define the Database class, which is a concrete implementation for those cases where you want to retrieve the data from your database. This class depends on the DataSource abstraction through inheritance. Finally, you define the API class to support retrieving the data from the REST API. This class also depends on the DataSource abstraction.

Here’s how you can use the FrontEnd class in your code:

Python

>>> from app_dip import API, Database, FrontEnd

>>> db_front_end = FrontEnd(Database())
>>> db_front_end.display_data()
Display data: Data from the database

>>> api_front_end = FrontEnd(API())
>>> api_front_end.display_data()
Display data: Data from the API

Here, you first initialize FrontEnd using a Database object and then again using an API object. Every time you call .display_data(), the result will depend on the concrete data source that you use. Note that you can also change the data source dynamically by reassigning the .data_source attribute in your FrontEnd instance.

WhatsApp Image 2024-06-20 at 17.33.39 (1)

Unlocking the Secrets of Cryptography: Symmetric vs. Asymmetric Encryption

Cryptography is broadly divided into two levels. One with safe cryptographic recipes that require little to no configuration choices. These are safe and easy to use and don’t require developers to make many decisions. The other level is low-level cryptographic primitives. These are often dangerous and can be used incorrectly. They require making decisions and having an in-depth knowledge of the cryptographic concepts at work. Because of the potential danger in working at this level, this is referred to as the “hazardous materials” or “hazmat” layer. These live in cryptography.

Keys:

  • Symmetric Key: The same key is used for both encryption and decryption.
  • Asymmetric Key: Two different keys are used – a public key for encryption and a private key for decryption.

(Fernet ) Symmetric Encryption:

Fernet guarantees that a message encrypted using it cannot be manipulated or read without the key. Fernet is an implementation of symmetric (also known as “secret key”) authenticated cryptography. Fernet also has support for implementing key rotation via MultiFernet.

class cryptography.fernet.Fernet(key).

This class provides both encryption and decryption facilities.

Ex:

>>> from cryptography.fernet import Fernet

 >>> key = Fernet.generate_key()

>>> f = Fernet(key)

>>> token = f.encrypt(b”my deep dark secret”)

>>> f.decrypt(token);

Output: b’my deep dark secret’

Parameters:

 key (bytes or str) – A URL- “safe base64-encoded 32-byte key”.

              This must be kept secret. Anyone with this key is able to create and read messages. classmethod generate_key() Generates a fresh fernet key. Keep this some place safe! If you lose it you’ll no longer be able to decrypt messages; if anyone else gains access to it, they’ll be able to decrypt all of your messages, and they’ll also be able forge arbitrary messages that will be authenticated and decrypted. encrypt(data) Encrypts data passed. The result of this encryption is known as a “Fernet token” and has strong privacy and authenticity guarantees.

Classmethodgenerate_key() :

Generates a fresh fernet key. Keep this some place safe! If you lose it you’ll no longer be able to decrypt messages; if anyone else gains access to it, they’ll be able to decrypt all of your messages, and they’ll also be able forge arbitrary messages that will be authenticated and decrypted.

Encrypt(data):

 Encrypts data passed. The result of this encryption is known as a “Fernet token” and has strong privacy and authenticity guarantees.

decrypt (token, ttl=None):

       Decrypts a Fernet token. If successfully decrypted you will receive the original plaintext as the result, otherwise an exception will be raised. It is safe to use this data immediately as Fernet verifies that the data has not been tampered with prior to returning it.

Parameters:

• token (bytes or str) – The Fernet token. This is the result of calling encrypt().

• ttl (int) – Optionally, the number of seconds old a message may be for it to be valid. If the message is older than ttl seconds (from the time it was originally created) an exception will be raised. If ttl is not provided (or is None), the age of the message is not considered.

Returns bytes:  The original plaintext.

Raises:

 • cryptography.fernet.InvalidToken – If the token is in any way invalid, this exception is raised. A token may be invalid for a number of reasons: it is older than the ttl, it is malformed, or it does not have a valid signature.

• TypeError – This exception is raised if token is not bytes or str.

Algorithms:

  • Symmetric Algorithms:
    • AES (Advanced Encryption Standard): Widely used, secure, and efficient.
    • DES (Data Encryption Standard): Outdated and less secure, replaced by AES.
  • Asymmetric Algorithms:
    • RSA (Rivest-Shamir-Adleman): Commonly used for secure data transmission.
    • ECC (Elliptic Curve Cryptography): More efficient and secure compared to RSA.
  • Hash Functions:
    • SHA-256 (Secure Hash Algorithm): Produces a 256-bit hash value, widely used for integrity checks.
    • MD5 (Message Digest Algorithm 5): Produces a 128-bit hash value, considered insecure for most applications.

Symmetric Encryption with AES:

import javax.crypto.Cipher;

import javax.crypto.KeyGenerator;

import javax.crypto.SecretKey;

import javax.crypto.spec.IvParameterSpec;

import java.security.SecureRandom;

import java.util.Base64;

public class AESEncryptionExample {

    public static void main(String[] args) throws Exception {

        // Generate a random key and IV (Initialization Vector)

        KeyGenerator keyGen = KeyGenerator.getInstance(“AES”);

        keyGen.init(128); // 128 bits key size

        SecretKey secretKey = keyGen.generateKey();

        byte[] iv = new byte[16];

        SecureRandom random = new SecureRandom();

        random.nextBytes(iv);

        IvParameterSpec ivSpec = new IvParameterSpec(iv);

        // The plaintext message to be encrypted

        String plaintext = “This is a secret message”;

        // Encrypt the plaintext

        String ciphertext = encrypt(plaintext, secretKey, ivSpec);

        System.out.println(“Ciphertext: ” + ciphertext);

        // Decrypt the ciphertext

        String decryptedText = decrypt(ciphertext, secretKey, ivSpec);

        System.out.println(“Decrypted text: ” + decryptedText);

    }

    public static String encrypt(String plaintext, SecretKey key, IvParameterSpec iv) throws Exception {

        Cipher cipher = Cipher.getInstance(“AES/CBC/PKCS5Padding”);

        cipher.init(Cipher.ENCRYPT_MODE, key, iv);

        byte[] encryptedBytes = cipher.doFinal(plaintext.getBytes());

        return Base64.getEncoder().encodeToString(encryptedBytes);

    }

    public static String decrypt(String ciphertext, SecretKey key, IvParameterSpec iv) throws Exception {

        Cipher cipher = Cipher.getInstance(“AES/CBC/PKCS5Padding”);

        cipher.init(Cipher.DECRYPT_MODE, key, iv);

        byte[] decryptedBytes = cipher.doFinal(Base64.getDecoder().decode(ciphertext));

        return new String(decryptedBytes);

    }

}

Asymmetric algorithms:

   Asymmetric cryptography is a branch of cryptography where a secret key can be divided into two parts, a public key and a private key. The public key can be given to anyone, trusted or not, while the private key must be kept secret (just like the key in symmetric cryptography). Asymmetric cryptography has two primary use cases: authentication and confidentiality. Using asymmetric cryptography, messages can be signed with a private key, and then anyone with the public key is able to verify that the message was created by someone possessing the corresponding private key. This can be combined with a proof of identity system to know what entity (person or group) owns that private key, providing authentication. Encryption with asymmetric cryptography works in a slightly different way from symmetric encryption. Someone with the public key can encrypt a message, providing confidentiality, and then only the person in possession of the private key is able to decrypt it.

Asymmetric Encryption with RSA (in Java):

import java.security.*;

import java.util.Base64;

import javax.crypto.Cipher;

public class RSAEncryptionExample {

    public static void main(String[] args) throws Exception {

        // Generate RSA key pair

        KeyPairGenerator keyPairGen = KeyPairGenerator.getInstance(“RSA”);

        keyPairGen.initialize(2048); // Key size

        KeyPair keyPair = keyPairGen.generateKeyPair();

        PublicKey publicKey = keyPair.getPublic();

        PrivateKey privateKey = keyPair.getPrivate();

        // The plaintext message to be encrypted

        String plaintext = “This is a secret message”;

        // Encrypt the plaintext using the public key

        String ciphertext = encrypt(plaintext, publicKey);

        System.out.println(“Ciphertext: ” + ciphertext);

        // Decrypt the ciphertext using the private key

        String decryptedText = decrypt(ciphertext, privateKey);

        System.out.println(“Decrypted text: ” + decryptedText);

    }

 Encrypt:

    public static String encrypt(String plaintext, PublicKey publicKey) throws Exception {

        Cipher cipher = Cipher.getInstance(“RSA”);

        cipher.init(Cipher.ENCRYPT_MODE, publicKey);

        byte[] encryptedBytes = cipher.doFinal(plaintext.getBytes());

        return Base64.getEncoder().encodeToString(encryptedBytes);

    }

 Decrypt:

    public static String decrypt(String ciphertext, PrivateKey privateKey) throws Exception {

        Cipher cipher = Cipher.getInstance(“RSA”);

        cipher.init(Cipher.DECRYPT_MODE, privateKey);

        byte[] decryptedBytes = cipher.doFinal(Base64.getDecoder().decode(ciphertext));

        return new String(decryptedBytes);

    }

}

Symmetric vs. Asymmetric Cryptography

There are two major types of encryption: symmetric (also known as secret key), and asymmetric (or public key cryptography). In symmetric cryptography, the same secret key to both encrypt and decrypt the data. Keeping the key private is critical to keeping the data confidential. On the other hand, asymmetric cryptography uses a public/private key pair to encrypt data. Data encrypted with one key is decrypted with the other. A user first generates a public/private key pair, and then publishes the public key in a trusted database that anyone can access. A user who wishes to communicate securely with that user encrypts the data using the retrieved public key. Only the holder of the private key will be able to decrypt. Keeping the private key confidential is critical to this scheme.

Asymmetric algorithms (such as RSA) are generally much slower than symmetric ones. These algorithms are not designed for efficiently protecting large amounts of data. In practice, asymmetric algorithms are used to exchange smaller secret keys which are used to initialize symmetric algorithms.

    Digital Signatures:

  • Used to verify the authenticity and integrity of a message.
  • Created using a private key and verified with the corresponding public key.

    Certificates and Public Key Infrastructure (PKI):

  • Certificates: Digital documents that bind a public key to an entity’s identity, issued by a Certificate Authority (CA).
  • PKI: A framework for managing public-key encryption, including the issuance, renewal, and revocation of certificates.
WhatsApp Image 2024-05-31 at 15.31.57

Unleashing the power of Google’s serverless dataproc

About Dataproc

Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Dataproc automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don’t need them.Cloud Dataproc also Supports Hadoop,pig,Hive and Spark and has high level API’s for Job Submission.It also offers connectors to BIgquery,BigTable ,Cloud Storage and so on.

The below diagram shows the architecture of the dataproc and how the process takes place from end to end

Comparison of Dataproc Serverless and Dataproc Compute Engine

Dataproc serverless lets you run spark workloads without requiring you to provision and manage your own dataproc cluster.

Processing Frameworks :

Batch :  Run time version 2.1 (spark 3.4, Java 17,scala 2.13) and earlier versions

Interactive : Pyspark kernals for spark 3.4 and earlier versions

Dataproc compute Engine is ideal if you want to provision and manage infrastructure, then execute workloads on spark and other open source processing frameworks.

Processing Frame works :

Spark 3.3 and other open source frameworks, such as Hive,Flink,Trino and kafka

The following table lists key differences between the Dataproc on Compute Engine and Dataproc Serverless for Spark.

CapabilityDataproc Serverless for SparkDataproc on Compute Engine
Processing frameworksBatch: Spark 3.4 and earlier versions
Interactive: PySpark kernels for Spark 3.4 and earlier versions
Spark 3.3 and earlier versions. Other open source frameworks, such as Hive, Flink, Trino, and Kafka
ServerlessYesNo
Startup time60s90s
Infrastructure controlNoYes
Resource managementSpark basedYARN based
GPU supportPlannedYes
Interactive sessionsYesNo
Custom containersYesNo
VM access (for example, SSH)NoYes
Java versionsJava 17, 11Previous versions supported
OS Login supportNoYes

Advantages of Dataproc over on-prem

Low Cost :

Dataproc is priced at only 1 cent per virtual CPU in your cluster per hour, on top of the other Cloud Platform resources you use. Dataproc charges you only for what you really use with second-by-second billing and a low, one-minute-minimum billing period.

Super Fast:

Dataproc clusters are quick to start, scale, and shutdown, with each of these operations taking 90 seconds or less, on average. This means you can spend less time waiting for clusters and more hands-on time working with your data.

Integrated :

Dataproc has built-in integration with other Google Cloud Platform services, such as BigQuery, Cloud Storage, Cloud Bigtable, Cloud Logging, and Cloud Monitoring, so you have more than just a Spark or Hadoop cluster—you have a complete data platform. 

Managed :

Use of spark and hadoop clusters without using the assistance of an administrator .Once your are done with the cluster, you can  trun off the cluster so we can save the money.

There are Two type of scaling :

  1. Horizontal scaling :
    1. More Efficint for Single machines.
  2. Vertical scaling :
    1. More Machines running together.

Dataproc Components :

When you create a cluster, standard Apache Hadoop ecosystem components are automatically installed on the cluster

Below are the some of the components that are included

  1. Hive
  2. Anaconda
  3. Docker
  4. Flink
  5. Jupyter NoteBook
  6. Hbase
  7. Presto
  8. Zepplin Notebook

Below are the list of Compute Engine Machine Types:

  1. General Purpose :

Best price-performance ratio for a variety of workloads

  • Storage Optimized :

Best for workloads that are low in core usage and high in Storage density

  • Compute Optimized :

Highest performance per core on compute Engine and optimized for compute intensive workloads.

  • Memory optimized :

Ideal for memory- intensive workloads, offering more memory power per core.

  • Accelerator -optimized :

Ideal for massively parallelized Compute Unified Device Architecture (CUDA) compute workloads, such as machine learning (ML) and high performance computing (HPC).

Dataproc Serverless :

Dataproc Serverless lets you run spark workloads without requiring you to provision and manage your own dataproc cluster.

There are two ways to run the Dataproc Serverless workloads :

  1. Dataproc Serverless for Spark Batch :

Use the google cloud console, Google Cloud CLI or Dataproc API to submit a batch workload to the dataproc serverless service. The service will run the workload on a managed compute infrastructure, autoscaling resources as need .

  • Dataproc serverless for spark Interactive :

     Write and run code in Jupyter Notebooks. Use the Jupyter lab plugin to create multiple notebook sessions from templates that you create and manage.

Dataproc Serverless Pricing :

  1. Dataproc Serverless for Spark pricing is based on the number of Data Compute Units (DCUs), the number of accelerators used, and the amount of shuffle storage used. DCUs, accelerators, and shuffle storage are billed per second, with a 1-minute minimum charge for DCUs and shuffle storage, and a 5-minute minimum charge for accelerators.
  2. Memory used by Spark drivers and executors and system memory usage are counted towards DCU(Data Compute Unit) usage.
  3. By default, each Dataproc Serverless for Spark batch and interactive workload consumes a minimum of 12 DCUs for the duration of the workload: the driver uses 4 vCPUs and 16GB of RAM and consumes 4 DCUs, and each of the 2 executors uses 4 vCPUs and 16GB of RAM and consumes 4 DCUs

Data Compute Unit (DCU) pricing :

TypePrice (hourly in USD)
DCU (Standard)$ 0.07848 per hour
DCU (Premium)$ 0.116412 per hour

Shuffle Storage Pricing :

It is prorated and billed per second, with a 1-minute minimum charge for standard shuffle storage and a 5-minute minimum charge for Premium shuffle storage.

TypePrice(monthly in USD)
Shuffle Storage (Standard)$ 0.052 per GB
Shuffle Storage(premium)$ 0.131 per GB

Accelerator pricing :

TypePrice(hourly in USD)
A100 40 GB$ 4.605062 per hour
A100 80 GB$ 6.165515 per hour

Use GPU’s with Dataproc Serverless :

You can attach GPU accelerators to your Dataproc Serverless batch workloads to achieve the following results:

  1. Speed up the processing of large-scale data analytics workloads.
  2. Accelerate model training on large datasets using GPU machine learning libraries.
  3. Perform advanced data analytics, such as video or natural language processing.

Dataproc Serverless for Spark Auto scaling :

When you submit your Spark workload, Dataproc Serverless for Spark can dynamically scale workload resources, such as the number of executors, to run your workload efficiently. Dataproc Serverless autoscaling is the default behavior, and uses Spark dynamic resource allocation to determine whether, how, and when to scale your workload.

Dataproc Serverless security :

Dataproc Serverless workloads automatically implement the following security hardening measures:

  1. Spark RPC(Remote procedure call) authentication is Enabled.
  2. Spark RPC encryption is enabled.
  3. User runs the code as the non-root “spark” user within the containers.

How to Run a spark Batch workload :

  1. Open the Batch Page.
  2. Batch Info :
    1. Batch ID : Specify the an ID for your batch workload.This value must be 4-63 lowercase characters. Valid characters are /[a-z][0-9]-/.
    1. Region : Select a region  where your workload will run.
  3. Container :
    1. Batch Type : Spark
    1. Run time version: The default runtime version is selected. You can optionally specify a non-default run time versions.
  4. Execution Configuration :
    1. You can Specify a service account to use to run your workload.if it is not specified then the work load will run under the default service account.
  5. Network Configuration :
    1. The VPC subnetwork that executes Dataproc Serverless for Spark workloads must be enabled for Private Google Access and meet the other requirements listed in Dataproc Serverless for Spark network configuration. The subnetwork list displays subnets in your selected network that are enabled for Private Google Access.
  6. Properties :
    1. Enter the Key (property name) and Value of supported Spark properties to set on your Spark batch workload. Note: Unlike Dataproc on Compute Engine cluster properties, Dataproc Serverless for Spark workload properties don’t include a spark: prefix.
  7.  Submit the Job.

                        Attaching screenshot for reference

Submitting spark batch workload using gCloud CLI :

To submit a spark batch workload to compute , run the following gcloud CLI gcloud dataproc batches submit spark command in cloud shell gcloud dataproc batches submit –project che-dama-m2c-dev\               –region = asia-south1 \               –version = 2.1 \               pyspark = [Main script path ] \               –py-files = [add_dependenices]\               –subnet = [name_of_the_subnetwork] \               –-service-account = [service_account_name] \               –properties  = spark.dynamicAllocation.executorAllocationRatio = 1,spark.dataproc.scaling.version=2,spark.executor.instances=2,spark.executor.cores=4,spark.executor.memory=8G,spark.driver.memory=4G,spark.driver.cores=2 \               –jars  = [path_to_jar_files] \               –master-machine-type=n1-standard-4 \                  –worker-machine-type=n1-standard-4 \                 –num-workers=2 \                –num-worker-local-ssds=4  Note : Select the spark version – 2.1 as it supports all the dependencies and Enter the Highlighted fields as they are mandatory while submitting the job.   1. Region :a)      Specify the REGION where your workload will run.2. Subnetwork:a)      If the default network’s subnet for the region specified in the gcloud dataproc batches submit command is not enabled for Private Google Access, you must do one of the following:i.        Enable default network’s subnet for the region for private google access.ii.       Use the –subnet = [SUBNET_URI] flag in the command to specify a subnet that has private Google Access Enabled.You can run the gcloud compute networks describe [NETWORK_NAME] command to list the URIs of subnets in a network.                

  • –properties :
    • Add this flag in the command to enter the supported spark properties that are required for your job.

gcloud dataproc batches submit \    properties=spark.sql.catalogImplementation=hive,spark.hive.metastore.uris=METASTORE_URI,spark.hive.metastore.warehouse.dir=WAREHOUSE_DIR> \  

  • –deps-bucket :
    • You can add this flag to specify a Cloud Storage bucket where Dataproc Serverless will upload workload dependencies. This flag is only required if your batch workload reference files on your local machine
  • RunTime verion :
    • Use this verions Flag to specify the dataproc Serverless runtime version of the workload.
  • Persistant History server :
    • The Persistent History Server in Apache Spark is a component that allows you to persist and query historical information about completed Spark applications. It provides a web interface to view information such as completed application details, stages, tasks, and logs.
    • The following command creates a PHS on a single-node Dataproc cluster. The PHS must be located in the region where you run batch workloads, and the Cloud Storage bucket-name must exist.

gcloud dataproc clusters create PHS_CLUSTER_NAME \    –region=REGION \    –single-node \    –enable-component-gateway \    –properties=spark:spark.history.fs.logDirectory=gs://bucket-name/phs/*/spark-job-history            

  • Submit a batch workload, specifying your running Persistent History Server

gcloud dataproc batches submit spark \    –region=REGION \    –jars=file:///usr/lib/spark/examples/jars/spark-examples.jar \    –class=org.apache.spark.examples.SparkPi \    –history-server-cluster=projects/project-id/regions/region/clusters/PHS-cluster-name \    Dataproc Serverless Templates : 1. Use dataproc serverless templates to quickly set up and run common spark batch workloads without writing the code. We can run the below following Dataproc Serverless Templates using Dataproc  1.    Cloud Spanner to cloud Storage a)      Attaching link for reference – https://cloud.google.com/dataproc-serverless/docs/templates/spanner-to-storage2.    Cloud Storage to Big querya)      Attaching link for reference – https://cloud.google.com/dataproc-serverless/docs/templates/storage-to-bigquery. 3.    Cloud Storage to Cloud Spannera)      Attaching link for reference :  https://cloud.google.com/dataproc-serverless/docs/templates/storage-to-spanner?hl=en 4.    Cloud Storage to Cloud Storage a)      Attaching link for reference : https://cloud.google.com/dataproc-serverless/docs/templates/storage-to-storage?hl=en               5.    Cloud Storage to JDBCa)      Attaching link for reference : https://cloud.google.com/dataproc-serverless/docs/templates/storage-to-jdbc6.    Hive to Bigquerya)      Attaching link for reference : https://cloud.google.com/dataproc-serverless/docs/templates/hive-to-bigquery7.    Hive to Cloud Storage a)      Attaching link for reference : https://cloud.google.com/dataproc-serverless/docs/templates/hive-to-storage8.    JDBC to Big Querya)      Attaching link for reference : https://cloud.google.com/dataproc-serverless/docs/templates/jdbc-to-bigquery9.    JDBC to Cloud Spannera)      Attaching link for reference : https://cloud.google.com/dataproc-serverless/docs/templates/jdbc-to-spanner10. JDBC to Cloud Storagea)      Attaching link for reference : https://cloud.google.com/dataproc-serverless/docs/templates/jdbc-to-storage11. JDBC to JDBC a)      Attaching link for reference : https://cloud.google.com/dataproc-serverless/docs/templates/jdbc-to-jdbc12. Pub/Sub to Cloud Storagea)      Attaching link for reference : https://cloud.google.com/dataproc-serverless/docs/templates/pubsub-to-storage 

Big query connector with Spark using CLI

Use the spark-bigquery-connector with Apache Spark to read and write data from and to BigQuery. gcloud dataproc batches submit pyspark \    –region=region \–jars=gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-version.jar \ Dataproc serverless Permissions : Dataproc serverless permissions allow users, including service accounts , to perform actions on Dataproc serverless resources. The following tables list the permissions necessary to call dataproc serverless API’s1. Batch permissions2. Session permissions3. Session runtime permissions4. Operation permissions Dataproc serverless roles : You grant roles to users or groups to allow them to perform actions on the Dataproc Serverless resources in your project. 

Role IDPermissions
roles/dataproc.admindataproc.batches.cancel
dataproc.batches.create
dataproc.batches.delete
dataproc.batches.get
dataproc.batches.list
dataproc.batches.cancel
dataproc.sessions.create
dataproc.sessions.delete
dataproc.sessions.get
dataproc.sessions.list
dataproc.sessions.terminate
dataproc.sessionTemplates.create
dataproc.sessionTemplates.delete
dataproc.sessionTemplates.get
dataproc.sessionTemplates.list
dataproc.sessionTemplates.update
roles/dataproc.editordataproc.batches.cancel
dataproc.batches.create
dataproc.batches.delete
dataproc.batches.get
dataproc.batches.list
dataproc.sessions.create
dataproc.sessions.delete
dataproc.sessions.get
dataproc.sessions.list
dataproc.sessions.terminate
dataproc.sessionTemplates.create
dataproc.sessionTemplates.delete
dataproc.sessionTemplates.get
dataproc.sessionTemplates.list
dataproc.sessionTemplates.update
roles/dataproc.viewerdataproc.batches.get
dataproc.batches.list
dataproc.sessions.get
dataproc.sessions.list
dataproc.sessionTemplates.get
dataproc.sessionTemplates.list

 The below flow diagram shows the complete dataflow and how the data process from the ingestion to getting desired output 

WhatsApp Image 2024-05-16 at 17.41.01

Unlock Angular Mastery: Expert Tips for Web Developers!

1. Introduction to Angular and its importance in web development

As web developers, it is crucial to stay updated on the latest trends and technologies to enhance our skills and remain competitive in the industry. One such technology that has gained immense popularity in recent years is Angular. Whether you are a beginner looking to learn Angular or an experienced developer wanting to master this framework, this blog will provide you with expert tips and tricks to take your skills to the next level. From optimizing performance to implementing best practices, mastering Angular will not only boost your career but also make you stand out on platforms like LinkedIn and Facebook. So, let’s dive in and unlock the full potential of Angular for web development success.

2. Understanding the latest features and updates in Angular

image source

To truly master Angular and stay at the forefront of web development, it is essential to keep up with the latest features and updates in this ever-evolving framework. Angular frequently releases new versions with improved functionalities and performance enhancements. By staying informed about the latest developments, you can incorporate cutting-edge features into your projects, ensuring they are up-to-date and competitive in today’s fast-paced digital landscape. Subscribe to official Angular blogs, follow key influencers on social media, and participate in relevant online communities to stay abreast of the newest trends and updates. Stay tuned for our next blog section as we delve deeper into leveraging these new features for enhanced Angular development.

3. Implementing best practices for efficient coding in Angular

Efficient coding is key to mastering Angular and maximizing the performance of your web applications. By following best practices, such as modularizing your code, using lazy loading for modules, and optimizing data binding, you can enhance the maintainability and scalability of your Angular projects. Utilizing tools like Angular CLI for scaffolding and code generation can also streamline your development process. Stay tuned as we explore these best practices in detail and provide insights on how to optimize your code for peak performance and efficiency in Angular development. Stay ahead of the curve by mastering these essential coding techniques.

4. Overcoming common challenges faced by web developers in Angular

Despite mastering Angular best practices, web developers still encounter common challenges in Angular development. From managing state using services to handling form validations efficiently, these hurdles can impact the performance of your web applications. In the upcoming section, we will delve into practical solutions and expert tips to overcome these challenges effectively. Stay updated as we guide you through navigating through these obstacles and mastering Angular with ease. Stay ahead in your Angular development journey by equipping yourself with strategies to tackle common challenges head-on.

5. Utilizing advanced techniques to enhance user experience in Angular applications

Now that we have addressed common challenges in Angular development, let’s focus on elevating user experience in your applications. Leveraging advanced techniques like lazy loading modules, prefetching data, and implementing server-side rendering can significantly boost the performance of your Angular applications. Moreover, incorporating animations, routing guards, and optimizing for mobile devices can enhance user engagement and satisfaction. Stay tuned as we explore these advanced techniques and provide expert insights on how to optimize your Angular applications for seamless user experience. Keep pushing the boundaries of your Angular development skills and create exceptional web applications that leave a lasting impression on users.

6. Exploring the future of Angular and staying ahead of the curve

As technology continues to evolve, staying current with the latest advancements in Angular is essential for web developers. With Angular constantly releasing updates and introducing new features, it is crucial to stay ahead of the curve to ensure your applications remain competitive and up-to-date. In our next discussion, we will delve into the future of Angular, exploring upcoming trends, best practices, and ways to future-proof your development projects. By embracing continuous learning and adapting to industry changes, you can position yourself as a top-tier Angular developer and create cutting-edge applications that set new standards in user experience. Stay tuned for valuable insights on how to navigate the ever-changing landscape of Angular development.

Conclusion: How mastering Angular can elevate your web development skills

Continuously honing your Angular skills can elevate your web development expertise to new heights. By mastering Angular, you not only enhance your ability to create dynamic and innovative web applications but also position yourself as a sought-after developer in the industry. Staying abreast of the latest trends and best practices in Angular empowers you to deliver high-quality solutions that meet the evolving needs of users and businesses alike. As you navigate the Angular landscape, remember that embracing lifelong learning and adaptation is key to your success. By integrating expert tips and tricks into your development process, you pave the way for a rewarding career as a proficient Angular developer. Level up your skills and unlock boundless opportunities in the world of web development with Angular.

WhatsApp Image 2024-04-26 at 18.39.17

Unleashing the Power of Integration: How SnapLogic’s Intelligent Platform Transforms Business Operations

  • Snaplogic is a commercial software company that provides an Integration Platform as a Service (iPaaS) tools for connecting cloud data sources, SaaS applications, and on-premises business software applications.
  • The Snaplogic Intelligent Integration Platform (IIP) uses AI-powered workflows to automate all stages of IT integration projects – design, development, deployment, and maintenance – whether on-premises, in the cloud, or in hybrid environments. 
  • With the Snaplogic platform, organizations can connect all of their enterprise systems quickly and easily to automate business processes, accelerate analytics, and drive transformation.
  • It uses Snaps, our integration connectors, which allow you to easily connect any combination of SaaS and on-premises applications and data sources. 

Introduction:

  • SnapLogic Enterprise Integration Cloud as Integration Platform as a Service (iPaas)
  • Its capability is best described by the 3A’s – Anything, Anywhere and Anytime.
  • It enables you to connect anything.
  • It enables you to execute this integration anytime.
  • It supports execution based on a schedule and execution in real time.

FEATURES OF SNAPLOGIC:

1. Unified Data Handling:

The Snaplogic Elastic Integration Platform stands out with its ability to manage both structured and unstructured data. Snap endpoints consume hierarchical data in their native format and then pass it on to downstream Snaps in a pipeline. This process eliminates the need to flatten data into records or convert a JSON document into a string or BLOB type.

2. Advanced App and Data Integration:

The Snaplogic platform is ready to handle your data integration needs. Whether they involve streaming or batch applications.

3. Beyond Point-to-Point Integration:

Thirdly, Snaplogic goes beyond the limitations of point-to-point cloud integration. Snaplogic platform uses a hub and spoke integration architecture. This approach allows customers to integrate complex apps both in the cloud and on-premises incrementally and cost-effectively, without manual coding or re-work.

4. Hybrid Integration Use Cases:

As SaaS adoption grows and gains broader acceptance in IT organizations, enterprises need modern integration processes. They need solutions that cater to different scenarios. These scenarios include cloud-to-cloud, cloud-to-ground, ground-to-ground, and hybrid (cloud-to-cloud and ground).

5. Multi-Tenant, Software-Defined Cloud Service:

The Snaplogic Elastic Integration Platform builds on the principles of Software-Defined Networking (SDN). The “control plane” decides where and how data is processed based on user configuration and preferences. Meanwhile, the “data plane” (aka the Snaplex) processes the data as per instructions from the control plane.

SnapLogic supports all types of integration:

  1. Cloud to Cloud
  2. Cloud to Ground
  3. Ground to Ground
  4. Hybrid
    • It also allows you to run these integrations anywhere.
    • These allow you to move your application closer to where your data resides, thereby making integration more efficient by avoiding large amount of data across the network.

SnapLogic consists of three primary components:

  1. Platform
  2. Snaplexes
  3. Snaps
  • Platform is a cloud-based application which is delivered as a multi-tenant cloud service.
  • These applications allow you to design, monitor and execute applications.
  • Snaplexes are data processing engines where pipelines are executed.
  • It is available as a cloud plex which SnapLogic managers pour you, ground plex where you manage within your firewall or private cloud and a Heder plex which is a specific type of ground plex used in Hadoop environment.
  • Snaps are computational units which enable you to perform data operations and connect to end points.
  • SnapLogic provides more than 400 snaps.

PLATFORM:

Designer:

  • The Snaplogic Designer is the user interface that enables you to develop pipelines.
  • The Designer comprises the Canvas, the Snap Catalog or Asset Palette, and the toolbars.
  • A Designer Canvas tab opens by default for every open Pipeline. 
  • It provides integration assistant for developing the pipelines.
  • It provides data preview.
  • We can test pipelines with manuals execution.

Dashboard:

It comprises of:

  • Health wall: monitors pipeline and snaplex health.
  • Pipeline wall: view pipeline execution statistics and details.
  • Snaplex wall: view snaplex statistics.
  • Insight wall: view time series data and logs.

Project Spaces and Projects:

  • Project spaces and projects enable you to organize your assets. They also help to export and import all related artifacts.
  • A project space consists of one or more projects.
  • A project is used to store all related files, accounts, tasks, pipelines, snap packs and snaplexes.

SNAPLEX:

  • Snaplexes are the data processing engine of the snap logic platform.
  • It is made up of nodes. Nodes are containers or virtual machines or ‘real’ hardware-based servers.
  • Minimum of two nodes required per a snaplex for production use.
  • A snaplexis available to one organization.
  • It is horizontally and vertically scalable. To add more capacity, add more nodes or increase the size of the node.
  • you can deploy one or many Snaplexes as required to run pipelines and process data.

TYPES:

  • A snaplex can be a cloudplex or groundplex .
  • For cloudplexes snaplogic will install and maintain the snaplex nodes.
  • For groundplexes customers need to install the snaplex software on nodes in the customer data center.

Cloudplex:

  • All cloudplex instances run inside the snap logic IIP.
  • A Cloudplex is ideal if you require integrations that orchestrate across cloud applications (such as Salesforce, ServiceNow, and Workday) with no on-premises connections that do not require any software to run behind a firewall.
  • Use SnapLogic Managerand Dashboardto administer and monitor your Cloudplex.

Advantages:

  • No infrastructure overhead.
  • Flexibility in deployment.
  • Minimal administrative tasks required.

Disadvantages:

  • No support for customization.
  • Only endpoints reachable over the public internet are accessible from Cloudplexes.

Groundplex:

  • If you need on-premises connectivity (such as SAP, Oracle, or Microsoft Dynamics AX) then you require a Groundplex that runs behind the firewall.
  • Although Groundplex nodes run on private or virtual private data centers, Groundplex instances are managed remotely by the SnapLogic Platform’s control plane.  

Advantages:

  • Customized configurations supported
  • Choice of security implementation and network communication
  • Flexibility in choice of node resources
  • Can access private endpoints which are not reachable over the public internet.
  • Allows for data processing closer to the source/target endpoints which come with performance and security benefits.

Disadvantages:

  • Requires administrative oversight
  • Setup requires planning.

High-Level Architecture:

  • The Data plan is where all the data transfers and data processing take place.
  • Data plans consist of data sources and snaplexes.
  • The control plan consists of the platform and snaps.
  • The control plan only deals with meta data and doesn’t interact with any data.
  • The meta data is sent to snaplexes for processing, snaplexes connect to end points, retrieve data and process it.
  • After the process is completed snaplexes write data to end points.
  • Snaplexes only process data and don’t store or cash it.

ADVANTAGES:

  1. SnapLogic is a unified platform for data and application integration.
  2. It can connect to both ligase and modern applications.
  3. SnapLogic provides self-service capability using simple design and a platform approach.
  4. SnapLogic uses modern architecture; it doesn’t use any ligase components that might prevent data and application integration services from running at clous speed.
  5. SnapLogic provides snaps for a vast range of applications.
  6. SnapLogic also enables you to create custom snaps.
  7. Unlike point-to-point integration tools, SnapLogic enables you to use multiple end points in a single workflow.
  8. SnapLogic also provides you with the iris to expertise the development process.

Iris and the Integration Assistant:

  • SnapLogic iris is the industry’s first self-driving technology that applies ai to enterprise integration.
  • Iris uses advanced algorithms from millions of meta data elements and billions of data flows.
  • Iris provides integration assistant – which is the recommendation engine.
  • It recommends snaps that are most likely to be used next.

KEY TERMS AND CONCEPTS:

  1. Snaps
  • Snap is a computational unit that performs a single data operation.
  • Snaps only process data and don’t store it.
  • Snaps increase reusability by containerizing functionality.
  • There are 6 types of snaps: read, parse, transform, flow, format and write.
  • Read snaps – read data from end points including enterprise applications, analytics platforms and databases.
  • Parse snaps – convert data from binary format to document format for further processing.
  • Transform snaps – enables you to transform or manipulate data.
  • Flow snaps – enables to route data based on certain configural conditions.
  • Format snaps – convert data from document format to binary format.
  • Write snaps – write or send data to various end points.
  • Snap Packs
  • A logical grouping of snaps based on functionality or endpoints.
  • Each snap within a snap pack provides a subset of the overall functionality.
  • There are 3 types of snap packs: core snap packs, premium snap packs, and custom snap packs.
    • Core snap packs – enable you to perform data operation and data transformation connected to commonly used end points. These are available by default.
    • Premium snap packs – enable you to interact with various data sources and applications. There is various sub categorized based on the type of end point they are connected to. These various subs categorized include analytics, data, enterprise and social snap packs. We need to purchase premium snap packs based on our requirements.
    • Custom snap packs – There might be cases where we might want to connect to end points which are beyond the scope of existing snap packs or perform custom data transformation. Custom snap packs built using SnapLogic SDK and SnapLogic Maven Archetype.
  • Pipelines
  • A pipeline is an integration workflow which is created by connecting a set of snaps.
  • A pipeline describes the workstation of the data flow between the end points.
  • Patterns
  • It is a reusable template that is used to create pipelines
  • Tasks
  • Tasks enable you to execute pipelines.
  • There are 3 types of tasks: triggered, scheduled and ultra tasks.
    • Triggered Tasks – generate URLs that can be used to execute a pipeline.
    • Scheduled Tasks – execute a pipeline based on the specific schedule.
    • Ultra Tasks – generate URL that can be used to execute an ultra-pipeline and provides low latency processing.
  • Difference between triggered and ultra tasks: Ultra task is a pipeline execution mode which provides low latency processing where multiple instances of the pipeline are always running.
  • Note that pipelines must meet certain criteria before they can be executed using ultra tasks.
  • Pipelines which meet all the required criteria are called ultra pipelines.
  • Snaplexes
  • These are data processing engines where pipelines are executed.
  • Snaplexes process data and stream data between applications, databases, files, social and big data sources.
  • Snaplexes receive pipeline meta data from the SnapLogic control plan and connect with various end point to read and write data
  • There are 2 types of snaplexes: cloudplexes and groundplexes.
    • Cloudplexes – runs on the SnapLogic integration cloud,.Cloudplexes administrated and operated by SnapLogic.
    • Groundplexes – runs on your server behind the firewall. Groundplexes must be administrated and operated by our organization administrators.
  • Each snap plex is made of one or more nodes.
  • Nodes
  • A node is a host or server, running windows or Linux, with the SnapLogic application installed.
  • The SnapLogic application runs as a JVM.
  • If you are using cloudplexes you can reach your SnapLogic administrator to increase the processing capacity in case if you are using groundplexes you can scale out by adding more nodes or machines.
  • Organizations(orgs)
  • SnapLogic uses each tenant as an organization.
  • Tenants used for different stages of development.
  • Most customers use 3 organizations: development, testing and production.
  • Subscription Features
  • Organization level functions that enabled for your SnapLogic enterprise integration cloud instance.
  • Features include ultra tasks and enhanced account encryption.
    • Ultra tasks – provides low-latency processing.
    • Enhanced account encryption –
      • Provides additional security for account information.
      • Encrypt account information stored on SnapLogic enterprise integration cloud using public key.
      • Decrypts the account information on groundplexes using private key.
      • Supports key rotation.

EXPRESSION LANGUAGES:

  • Enables you to write expressions to transform and manipulate data.
  • Commonly used in snaps such as filter, router and mapper.
  • Based on JavaScript with some exceptions.
  • Uses parameters and fields to interact with data.

PARAMETERS:

  • Enables you to dynamically pass inputs when invoking pipelines.
  • Variables whose values remain unchanged during the pipeline’s execution.
  • Commonly used scenarios – parent-child pipeline construct and triggered pipelines.
  • A parameter is a key-value pair. Key is the unique identifier of some data, Value.
  • Keys must contain only alpha-numeric characters.

To learn more:

TASKS

  • Tasks are how pipelines become operational.
  • Tasks enable you to execute pipelines.
  • As a general example, consider the Pipeline designer as a developer, typically with the background of an IT specialist, who builds Pipelines that fit the data processing needs of an organization. After you design a Pipeline, you then need to run it on a schedule or after an event occurs. Hence, Tasks provide an easy way to accomplish the productization of Pipelines.  
  • Tasks enable you to execute pipelines by scheduling or triggering a URL.
  • Choose this option if you need to execute pipelines by triggering the pipeline by using http call.
  • A Triggered Task also allows passing data into and retrieving data from a Pipeline.
  • Scheduled Task:
  • Choose this option if you want to execute pipeline for a certain time or schedule.
  • Ultra Task:
  • Generates a URL for executing the ultra-pipelines and provide low latency processing.
  • Low latency means the time that it takes to travel from source to destination.
  • Choose this option for either specialized, low-latency jobs that need to process documents continuously or for Pipelines based on the always-on-design.

ULTRA PIPELINES:

  • Ultra Pipelines provide the speed and scalability to run the most important integrations that require high availability, high throughput, and persistent execution.
  • This feature ensures that data arrives at its destination in an instant, regardless of the volume of data being processed, the variety of endpoints involved, or the complexity of the integration.
  • Ultra Pipelines are executed as Ultra Tasks, which enable the Pipelines to consume documents continuously from external sources (Pipeline executions that are always-on), including sources that are not compatible with Triggered Tasks or those that require low-latency processing.
  • An Ultra Task can manage and load-balance multiple Ultra Pipeline instances in multiple Snaplex nodes to process multiple documents simultaneously.  
  • The Ultra Task monitor ensures that the required number of instances are running and restarts the Task when an instance fails, or a node restart is required.
  • The Ultra Task pulls requests from the Feed Master’s queue and assigns them to its associated Ultra Pipeline instances.  

DIFFERENCE BETWEEN TRIGGERED TASKS AND ULTRA TASK:

  • Ultra task is a constantly running pipeline listening for the new documents streaming in.
  • By the time a document is sent to an Ultra Task, the underlying Pipeline is already prepared and can start processing the document instantly.

Pipeline run through the URL that is created from the Triggered Task has to go through the pipeline prepare stage first.

  • Depending on a variety of criteria (Pipeline size, accounts, and types of Snaps used in the Pipeline), the preparation stage can take time, which makes the Ultra Task usage beneficial when the expected response time is a matter of sub-seconds.  
  • Since Ultra Pipelines are always running, they can be used to process documents continually from external sources like message queues. Also, data passed into an Ultra pipeline is more reliably processed.
  • In terms of Pipeline design, the Ultra Task is more restricting when compared to Triggered Tasks because of the number of unsupported Snaps and restrictions around the input and output Snaps and Pipeline parameters
  • In addition, the Snaplex on which the Ultra Task runs must have a Feed Master.

——————————————————————————————————————–

SUMMARY:

1. Integration connects different systems using a source and target system.

2. A source system provides data, while a target system receives data.

3. An integration tool, such as SnapLogic, is needed to connect the source and target systems.

4. SnapLogic is an iPaaS (Integration Platform as a Service) tool.

5. SnapLogic combines platform as a service and integration.

6. As a cloud-based tool, SnapLogic can be used to connect different systems.

7. SnapLogic is defined by the Three A’s: anything, anywhere, and anytime.

8. SnapLogic provides a platform for developers to use snaps and integrate systems.

SnapLogic is an Integration Platform as a Service (iPaaS) tool that allows for the connection of various source and target systems using business rules. A source system is where data is read from, while a target system is where data is sent to. SnapLogic provides a platform for developers to use the snaps provided by SnapLogic to integrate different systems. SnapLogic is a cloud-based tool that can be used to connect to any system, anywhere, and anytime. SnapLogic is defined by the Three A’s – Anything, Anywhere, and Anytime. The tool can integrate any system, including files, APIs, and applications. SnapLogic is an iPaaS tool that combines the features of a platform as a service (PaaS) and integration (I) to provide a platform for developers to integrate different systems seamlessly. The tool’s main objective is to provide a platform that simplifies the integration process, allowing developers to focus on developing custom solutions for their clients. The platform is easy to use and provides developers with the ability to integrate various systems without the need for extensive coding. SnapLogic is a tool that simplifies the integration process and provides developers with a platform to develop custom solutions for their clients.

WhatsApp Image 2024-02-14 at 16.31.02

Decoding the Tech Battle: Monoliths and Microservices clash in the Digital Arena

Monolith:

Advantages:

  1. Single Codebase: All components of the app exist in one codebase.
  2. Easy Development: Simpler to add new features.
  3. Easy Testing: Easier to simulate and test scenarios.
  4. Easy Deployment: Easy to deploy the entire platform.
  5. Easy Debugging: Easier to trace bugs.
  6. Easy Performance Monitoring: Easier to monitor performance of all features.

Disadvantages:

  1. Slower Development Speed: As the system grows, adding new features can slow down.
  2. Scalability Issues: Scaling issues can arise when the user base grows.
  3. Reliability: A bug in one part can bring down the entire system.
  4. Flexibility Issues: Can’t add a feature if it requires a different tech stack.
  5. Deployment Complexity: Small changes require complete deployment.

Microservices:

Advantages:

  1. Agile Development: Faster development, can update services independently.
  2. Scalability: Can scale only necessary services.
  3. Highly Testable & Maintainable: Each microservice can be tested and maintained separately.
  4. Flexibility: Different services can use different technology stacks.
  5. Independent Deployment: Each microservice can be deployed independently.

Disadvantages:

  1. Management Maintenance: Managing multiple services can be complex.
  2. Infrastructure Costs: Can increase due to separate databases and servers for each service.
  3. Organizational Issues: Communication challenges can arise among teams working on different services.
  4. Debugging Issues: Requires advanced tools for efficient debugging.
  5. Lack of Standardization: Different standards among services can create integration issues.
  6. Lack of Code Ownership: Potential issues in shared code areas due to divided responsibilities.

Monolith to microservices migration:

Strangler Fig Pattern

The Strangler Fig Pattern can be an effective method for migrating a monolithic system to microservices. Here’s how it can be applied to stock market application:

  1. Identify Part to Migrate: Identify a part of the existing system that needs migration. For instance, we may choose the ‘Buy/Sell transaction’ functionality in the monolith application that we wish to replace.
  2. Implement New Microservice: Implement this functionality into a new microservice. We could create a new ‘Transaction Service’ that handles all the buy/sell operations independently.
  3. Redirect Requests: Start gradually redirecting the requests from the old monolithic system to the new ‘Transaction Service’. This can be done using a routing mechanism, which routes a specific portion of requests to the new service. This allows the new microservice to start handling real-world requests while also giving a chance to monitor its performance and correct any issues before it fully takes over the functionality from the monolith.

Branch By Abstraction Pattern

  1. Abstraction: Identify the ‘Buy/Sell transaction’ part of the monolithic system. Create an interface called ‘TransactionService’ that defines the operations like ‘buy’ and ‘sell’. The existing monolith codebase would implement this interface.
  2. New Implementation: Now, start developing the new microservice which will also implement the ‘TransactionService’ interface. This new microservice is designed to handle the ‘Buy/Sell transaction’ operations independently.
  3. Switch: Once the microservice is ready and thoroughly tested, gradually start redirecting the ‘Buy/Sell transaction’ requests from the monolithic system to the new microservice. This could be accomplished through feature toggles or a routing mechanism, which allows you to control which requests are processed by the new microservice.
  4. Remove Legacy Code: When the new microservice has fully taken over the ‘Buy/Sell transaction’ operations and is working as expected, the legacy ‘Buy/Sell transaction’ code in the monolith system can be safely removed.

Branch by Abstraction allows this transition to happen smoothly, without disrupting the functioning of the system. The old and new systems can coexist and operate in parallel during the transition, reducing risks and enabling continuous delivery.

Remote Procedure Call (RPC):

A Remote Procedure Call (RPC) is similar to a function call, but it’s used in the context of networked applications. It allows a program running on one machine to call a function on a different machine (a remote server) as if it were a local function.

For example, consider a client-server application where the server provides a function to add two numbers. But instead of calling this function locally, a client on a different machine can use RPC to call this function on the server.

gRPC:

gRPC is a modern, open-source, high-performance RPC framework developed by Google. It uses Protocol Buffers (protobuf) as its interface definition language, which describes the service interface and the structure of the payload messages. This is an efficient binary format that provides a simpler and faster data exchange compared to JSON and XML.

Protocol Buffers:

Protocol Buffers (Protobuf) is a binary encoding format that allows you to specify a schema for your data using a specification language. This schema is used to generate code in various languages and provides a wide range of data structures that result in serialized data being small and quick to encode/decode.

gRPC in Action:

Here’s a simplified version of how gRPC works in a client-server architecture:

  1. gRPC Client: The process starts from the gRPC client. The client makes a call through a client stub, which has the same methods as the server. The data for the call is serialized using Protobuf into a binary format.
  2. Transport: The serialized data is then sent over the network via the underlying transport layer.
  3. HTTP/2: gRPC utilizes HTTP/2 as its transport protocol. One significant benefit of HTTP/2 is that it allows multiplexing, which is the ability to send multiple streams of messages over a single, long-lived TCP connection. This reduces latency and increases the performance of network communication.
  4. gRPC Server: The server receives the serialized data, deserializes it back into the method inputs, and executes the method. The result is then sent back in the reverse direction: serialized and sent back to the client via HTTP/2 and the transport layer, then deserialized by the client stub.

The adoption of gRPC in web client and server communication has been slower due to a few significant factors:

  1. Browser Compatibility: Not all browsers fully support HTTP/2, the protocol underlying gRPC. Even where HTTP/2 is supported, the necessary HTTP/2 features such as trailers might not be available.
  2. gRPC-Web: While gRPC-Web, a JavaScript implementation of gRPC for browsers, does exist, it doesn’t support all the features of gRPC, such as bidirectional streaming, and is less mature than other gRPC libraries.
  3. Text-Based Formats: In the context of web development, formats like JSON and XML are very common and convenient for data interchange. They’re directly compatible with JavaScript and are human-readable. gRPC, on the other hand, defaults to Protocol Buffers, a binary format that’s more efficient but not as straightforward to use on the web.
  4. Firewalls and Proxies: Some internet infrastructure might not support HTTP/2 or might block gRPC traffic, causing potential network issues.
  5. REST Familiarity: REST over HTTP is a well-understood model with broad support in many programming languages, frameworks, and tools. It’s simpler to use and understand, which can speed up development and debugging.
  • Increased Complexity: While gRPC has performance benefits, it also adds complexity to the system. The performance gain might not always be worth the added complexity, particularly for applications that don’t require high-performance inter-service communication.

Webhooks and Event-Driven Architecture:

Webhooks are a method of augmenting or altering the behavior of a web page or application with custom callbacks. These callbacks can be maintained, modified, and managed by third-party users and developers who may not necessarily be affiliated with the originating website or application.

In the context of a stock market application like Zerodha, this translates to the following:

Zerodha, a brokerage platform, wants to stay updated with price changes from the stock exchange (SEBI). To achieve this, Zerodha would provide a webhook, essentially a callback URL, to SEBI. This URL is designed to be hit whenever the specific event of interest, such as a particular stock reaching a certain price, occurs.

This is an example of an Event-Driven Architecture where communication happens based on events, rather than constant polling or maintaining a persistent connection.

Here’s the sequence of steps in more detail:

  1. Register: Zerodha first registers a webhook with SEBI. This is a callback URL that Zerodha exposes and asks SEBI to call when a certain event happens. In this case, when a particular stock price reaches a specified value.
  2. Trigger Event: When the stock price reaches the specified value, the event is triggered on the SEBI side.
  3. Invoke Webhook: SEBI then sends an HTTP request (usually a POST request) to the registered webhook URL provided by Zerodha. The request would contain information about the event in its body, typically formatted in JSON or XML.
  4. Receive and Process: Zerodha receives the HTTP request and processes the data contained in the body of the request. Based on the information received, it can take necessary action, such as notifying the user about the price change.

This event-driven method allows efficient communication and helps Zerodha stay updated with real-time changes in stock prices. It avoids the need for long polling and persistent connections, which could be expensive and not scalable when dealing with millions of clients.

Other examples:

  1. CI/CD Deployment Actions
  2. MailChimp
  3. Zapier
  4. Stripe