Introduction Of ISAAC Consensus Protocol for BOSNet - Part 2
Various scenarios and explanations on ISAAC

Written by BlockchainOS DEV Team
April 12 2018

Preamble

Our previous article, “Introduction Of ISAAC Consensus Protocol For BOSNet”, introduced the concept of ISAAC; yet more challenges still remain to perfect the protocol. Among them, we are currently putting a large amount of effort to clarify what network robustness is, and in doing so, constructing a robust network topology based on quorums.

Based on the “FLP Impossibility Theorem [1]” and the theoretical analysis of faulty nodes stated in SCP documentation, we created both success and failure cases in relation to consensus and categorized the cases based on quorum safety, liveness, and fault tolerance. Through analyzing the following cases, we aim to enrich our understanding to build a better BOSNet.

Properties of a Consensus Protocol – using the FLP Impossibility Theorem

To propose a robust network, we need to initially identify the properties that compose its respective quality, including incorporating theories and concepts in distributed systems.

The most influential result in distributed systems is the “FLP Impossibility Theorem [1]“, published by Fischer, Lynch and Patterson. Their paper, ‘Impossibility of Distributed Consensus with One Faulty Process’ settled a long time dispute in distributed computing. The result of this paper, widely known as the “FLP result”, shows that there is no single distributed algorithm which solves the consensus problem in an asynchronous system.

From the FLP theorem, a strong form of consensus needs to guarantee the following three properties:

  • Agreement: All nodes output the same value.
  • Validity: The values that have been decided must have been proposed by some node within the network.
  • Termination: All non-faulty nodes eventually output a value.

 

David Mazières’ speech ‘Internet Level Consensus is Practical’ (ILC) [2], combines these three properties and ascertains consensus protocols’ concurrent inability to satisfy the following three properties concurrently:

  • Safety(Agreement + Validity): All outputs produced have the same value (agreement), and the output value equals one of the agents’ inputs (validity)
  • Liveness (Termination): Eventually non-faulty agents will output a value (termination)
  • Fault Tolerance: Recovery of the network/consensus from the failure of an agent at any point. Fail-stop protocols can handle agent crashes. Byzantine-fault-tolerant protocols are able to handle arbitrary agent behavior.

Practically, there are no deterministic consensus protocols that are able to guarantee all three elements of safety, liveness, and fault tolerance, in an asynchronous system. Following on from Part 1 of or ISAAC release, this release will illustrate the various test cases that we actually used to make a more definitive protocol. These cases are the underlying foundation of how our ISAAC progressed to achieve consensus; and the results of these tests allow us to determine the properties to be implemented.

Implemented Test Scenarios on ISAAC

We interrogated several cases below to verify the validity of quorums in ISAAC.

  • Common case: In a quorum intersection, the number of common nodes can effect the achievement of consensus.
  • Deep intersected quorum: Although the quorums may be heavily intersected, failure of safety can still occur.
  • Failure of safety with many extras: If extra nodes (heavily) outnumber common nodes, failure of safety can occur.
  • Flower Form : This topology design (as described in Part 1) is an ideal form that guarantees quorum safety.

Safety: safe-common-1-is-faulty.yml

 

The topology is designed with two quorums each having five validators, and one node in common. Node ‘n4’ belong to both quorums and is also a faulty-node. Since the equation for safety, C – 1 >= f, is not satisfied, safety is not guaranteed in this design. The case file illustrating this scenario, safe-common-1-is-faulty.yml and other cases for safety is available at https://github.com/bosnet/isaac-consensus-protocol/blob/master/examples/cases/quorum-safety/safe-common-1-is-faulty.yml .

Running

Result(Failure)

This network design has 2 quorums with only one common node, n4. If this common node is faulty, the entire network cannot reach an agreement because the broadcasted messages are unable to reach the other quorum.

Liveness: liveness-fail.yml

Liveness is guaranteed when the number of non-faulty nodes are above a certain quorum threshold. For example, liveness is not achieved when there are too many faulty-nodes, and the number of non-faulty nodes do not reach the threshold. Consensus is impossible in this case. leading to consensus not being achieved. The case file, liveness-fail.yml and the other cases for liveness are available at https://github.com/bosnet/isaac-consensus-protocol/blob/master/examples/cases/liveness/liveness-fail.yml.

 

Running

Result (Failure)

In this example, the three faulty nodes are n0, n2, and n7.

n1’s validators are n0, and n2, n3, n7; and in its validator list, n0, n2, n7 are faulty nodes. n1’s quorum does not meet the requirements of liveness as the number of faulty nodes are above 2. n3’s validators are n0, n1, n2, n5. Among n3’s validators, n0 and n2 are faulty nodes and (from above) noting that n1 has failed liveness, the number of non-faulty nodes equate to two, hence n3’s also fails liveness. In conclusion, n1, n3 cannot reach consensus within their quorums.

 

Fault Tolerance: fault-tolerance-q1-n7-f3.yml

This case shows that the number of faulty nodes exceed the maximum number of allowable tolerance preventing consensus from being achieved. The design file fault-tolerance-q1-n7-f3.yml is available at https://github.com/bosnet/isaac-consensus-protocol/blob/master/examples/cases/fault-tolerance/fault-tolerance-q1-n7-f3.yml .



In this case:
 
  • there are 8 validators constructing 1 quorum; and
  • 3 nodes, n5, n6, n7 are faulty nodes.
 
 

Running

 

This case illustrates that a quorum consisting of eight nodes can operate with up to two faulty nodes. Any more additional faulty nodes will prevent the network from achieving consensus.

 

Failure of Safety(Double Spending): double-spending-many-extras.yml

This case illustrates when a sender intentionally sends 2 messages, both with the same message_id but different data. The message is sent to two different quorums with the intention to double spend.

In this case:
  • A total of 10 nodes form two quorums, with n4 and n5 as common nodes;
  • the message is sent to n1 and n9 with the same message_id but different data.
 
 

 

Running

 

Result (Failure)

This case illustrates the situation where nodes are unable to reach consensus due to the intentional double spending message sent to nodes of the intersected quorums. By reviewing the “messages” section of http://localhost:54320 and http://localhost:54321, it is evident that even though the message_id’s are the same, the data stored are different. The fact that the two messages are both valid within the system at this moment means that there is a chance of a double spend occurung.

 

Types of Faulty Nodes

In a distributed system for consensus, the behavior and state of a node is an important factor for keeping the network available to achieve consensus. We divided the below types of faulty nodes by their behaviors and states.
 
  • Alive, but Not Voting
  • No Response, Network Unreachable
  • Divergent Vote
  • State Regression

 

 
 

Alive, but Not Voting

Nodes are connected to each other inside their quorums. Non-faulty nodes sends messages regularly to other nodes signalling that it is alive and active. In the below illustration, the faulty node is sending signals that it is alive but does not participate in voting.

Running

 

Result

 

No Response, Network Unreachable

Although there are nodes connected to each other, there are cases which some nodes are irresponsive.

 

Running

 

Result

 

Divergent Vote

Even if a message is valid, a faulty node may intentionally oppose to its validity.

Running

Result

State Regression

The regular state conversions of a node in BOSNet are INIT, SIGN, ACCEPT, and ALL CONFIRM; however a faulty node may broadcast an irregular state to other nodes.

Running

Result

How Can We Handle These Faulty Nodes?

Unreachable nodes can easily be detected by the other nodes, however in most of cases, such as the case of a ‘Divergent Vote’, it may not be publicized at the time of voting. To out filter these faulty nodes, we need to audit the history of agreement and activities of nodes. For example, if one node is connected, but it does not broadcast the ballot message to the other nodes, this node could be the problematic faulty node. To discover faulty nodes, other nodes need to review the voting history after a session of consensus has occured; and if a node did not send ballots, the other nodes can determine that node to be the ‘No Vote’ faulty node.

Future Work

To ensure the ultimate stability of the BOSCoin network, Congress Voting, and Trust Contract; we need to first build the fundamental principles of our consensus protocol. We believe the presented cases above do not represent the entire set of exceptional cases to guarantee safety and trust of the network, as such we are undertaking ongoing improvement to of our ISAAC protocol.

This article and the previous Part 1 to ISAAC were intended to showcase our dedication and effort to create the consensus for our BOS ecosystem. We will continually communicate with the BOScoin community displaying our results from experiments conducted on the ISAAC protocol. We believe the solutions to our technical challenges, which we experienced when constructing the BOSCoin network, will also contribute to the overall crypto-world.

Once again, we appreciate your interest and support to BOScoin.

References

1. Michael J. Fischer, Nancy A. Lynch and Michael S. Paterson. Impossibility of Distributed Consensus with One Faulty Process. 1982 : https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf

2. internet level of consensus is practical : https://datatracker.ietf.org/meeting/98/materials/slides-98-saag-internet-level-consensus-ilc

 

Go to ISAAC Part 1