LESLIE LAMPORT, ROBERT SHOSTAK, and MARSHALL PEASE SRI International LESLIE LAMPORT,ROBERT SHOSTAK 和 MARSHALL PEASE SRI 国际
Reliable computer systems must handle malfunctioning components that give conflicting information to different parts of the system. This situation can be expressed abstractly in terms of a group of generals of the Byzantine army camped with their troops around an enemy city. Communicating only by messenger, the generals must agree upon a common battle plan. However, one or more of them may be traitors who will try to confuse the others. The problem is to find an algorithm to ensure that the loyal generals will reach agreement. It is shown that, using only oral messages, this problem is solvable if and only if more than two-thirds of the generals are loyal; so a single traitor can confound two loyal generals. With unforgeable written messages, the problem is solvable for any number of generals and possible traitors. Applications of the solutions to reliable computer systems are then discussed. 可靠的计算机系统必须能够处理向系统不同部分提供相互矛盾信息的故障组件。这种情况可以抽象地用一组拜占庭军队的将军围绕敌城与部队扎营来表达。将军们只能通过信使进行通信,必须就一个共同的作战计划达成一致。然而,其中一个或多个可能是叛徒,试图混淆其他人。问题是找到一种算法,确保忠诚的将军能够达成一致。研究表明,仅使用口头消息时,当且仅当超过三分之二的将军是忠诚的,该问题才可解决;因此,一个叛徒可以使两个忠诚的将军陷入混乱。使用不可伪造的书面消息时,该问题对于任意数量的将军和可能的叛徒均可解决。随后讨论了这些解决方案在可靠计算机系统中的应用。
Categories and Subject Descriptors: C.2.4. [Computer-Communication Networks]: Distributed Systems-network operating systems; D.4.4 [Operating Systems]: Communications Managementnetwork communication; D.4.5 [Operating Systems]: Reliability-fault tolerance 类别和主题描述符:C.2.4. [计算机通信网络]:分布式系统-网络操作系统;D.4.4 [操作系统]:通信管理-网络通信;D.4.5 [操作系统]:可靠性-容错
General Terms: Algorithms, Reliability 通用术语:算法,可靠性
Additional Key Words and Phrases: Interactive consistency 额外关键词和短语:交互一致性
1. INTRODUCTION 1. 引言
A reliable computer system must be able to cope with the failure of one or more of its components. A failed component may exhibit a type of behavior that is often overlooked-namely, sending conflicting information to different parts of the system. The problem of coping with this type of failure is expressed abstractly as the Byzantine Generals Problem. We devote the major part of the paper to a discussion of this abstract problem and conclude by indicating how our solutions can be used in implementing a reliable computer system. 一个可靠的计算机系统必须能够应对一个或多个组件的故障。一个故障组件可能表现出一种常被忽视的行为——即向系统的不同部分发送相互矛盾的信息。应对这种类型故障的问题在抽象层面上被称为拜占庭将军问题。本文的大部分内容将讨论这一抽象问题,并最终指出我们的解决方案如何应用于实现可靠的计算机系统。
We imagine that several divisions of the Byzantine army are camped outside an enemy city, each division commanded by its own general. The generals can communicate with one another only by messenger. After observing the enemy, they must decide upon a common plan of action. However, some of the generals 我们设想拜占庭军队的几个师驻扎在敌城外,每个师由自己的将军指挥。将军们只能通过信使相互通信。在观察敌情后,他们必须决定一个共同的行动计划。然而,其中一些将军
may be traitors, trying to prevent the loyal generals from reaching agreement. The generals must have an algorithm to guarantee that 可能有叛徒,试图阻止忠诚的将军们达成一致。将军们必须有一个算法来保证这一点
A. All loyal generals decide upon the same plan of action. A. 所有忠诚的将军都决定采取相同的行动方案。
The loyal generals will all do what the algorithm says they should, but the traitors may do anything they wish. The algorithm must guarantee condition A regardless of what the traitors do. 忠诚的将军们都会按照算法指示去做,但叛徒可能会做任何他们想做的事。无论叛徒做什么,算法都必须保证条件 A。
The loyal generals should not only reach agreement, but should agree upon a reasonable plan. We therefore also want to insure that 忠诚的将军们不仅应该达成一致,还应该同意一个合理的方案。因此,我们还希望确保的是
B. A small number of traitors cannot cause the loyal generals to adopt a bad plan. B. 少数叛徒无法导致忠诚的将军们采纳一个糟糕的计划。
Condition B is hard to formalize, since it requires saying precisely what a bad plan is, and we do not attempt to do so. Instead, we consider how the generals reach a decision. Each general observes the enemy and communicates his observations to the others. Let v(i)v(i) be the information communicated by the ii th general. Each general uses some method for combining the values v(1),dots,v(n)v(1), \ldots, v(n) into a single plan of action, where nn is the number of generals. Condition A is achieved by having all generals use the same method for combining the information, and Condition B is achieved by using a robust method. For example, if the only decision to be made is whether to attack or retreat, then v(i)v(i) can be General ii 's opinion of which option is best, and the final decision can be based upon a majority vote among them. A small number of traitors can affect the decision only if the loyal generals were almost equally divided between the two possibilities, in which case neither decision could be called bad. 条件 B 难以形式化,因为它要求精确定义什么是糟糕的计划,而我们并未尝试这样做。相反,我们考虑将军们如何做出决策。每位将军观察敌情并将他的观察结果传达给其他人。设 v(i)v(i) 为第 ii 位将军传达的信息。每位将军使用某种方法将这些值 v(1),dots,v(n)v(1), \ldots, v(n) 合并成一个行动方案,其中 nn 为将军人数。条件 A 通过让所有将军使用相同的信息合并方法来实现,条件 B 则通过使用一种鲁棒的方法来实现。例如,如果唯一需要做出的决策是进攻还是撤退,那么 v(i)v(i) 可以是第 ii 位将军对哪个选项更好的看法,最终决策可以基于他们之间的多数投票。只有当忠诚的将军们在两种可能性之间几乎势均力敌时,少数叛徒才可能影响决策,在这种情况下,任何决策都不能被称为糟糕。
While this approach may not be the only way to satisfy conditions A and B, it is the only one we know of. It assumes a method by which the generals communicate their values v(i)v(i) to one another. The obvious method is for the ii th general to send v(i)v(i) by messenger to each other general. However, this does not work, because satisfying condition A requires that every loyal general obtain the same values v(1),dots,v(n)v(1), \ldots, v(n), and a traitorous general may send different values to different generals. For condition A to be satisfied, the following must be true: 虽然这种方法可能不是满足条件 A 和 B 的唯一方式,但这是我们所知道的唯一方法。它假设了一种将将军们的数值 v(i)v(i) 相互传达的方法。显而易见的方法是第 ii 位将军通过信使将 v(i)v(i) 发送给其他每位将军。然而,这种方法行不通,因为满足条件 A 要求每位忠诚的将军获得相同的数值 v(1),dots,v(n)v(1), \ldots, v(n) ,而叛徒将军可能会向不同的将军发送不同的数值。为了满足条件 A,必须满足以下条件:
Every loyal general must obtain the same information v(1),dots,v(n)v(1), \ldots, v(n). 每位忠诚的将军必须获得相同的信息 v(1),dots,v(n)v(1), \ldots, v(n) 。
Condition 1 implies that a general cannot necessarily use a value of v(i)v(i) obtained directly from the ii th general, since a traitorous ii th general may send different values to different generals. This means that unless we are careful, in meeting condition 1 we might introduce the possibility that the generals use a value of v(i)v(i) different from the one sent by the ii th general-even though the ii th general is loyal. We must not allow this to happen if condition B is to be met. For example, we cannot permit a few traitors to cause the loya! generals to base their decision upon the values “retreat”, … “retreat” if every loyal general sent the value “attack”. We therefore have the following requirement for each ii : 条件 1 意味着将军不一定能直接使用从第 ii 个将军那里获得的值 v(i)v(i) ,因为一个叛变的第 ii 个将军可能会向不同的将军发送不同的值。这意味着,除非我们小心,否则在满足条件 1 时,我们可能会引入将军们使用与第 ii 个将军发送的值不同的 v(i)v(i) 值的可能性——即使第 ii 个将军是忠诚的。如果要满足条件 B,我们绝不能允许这种情况发生。例如,我们不能允许少数叛徒导致忠诚的将军基于“撤退”……“撤退”的值做出决定,而每个忠诚的将军都发送了“进攻”的值。因此,我们对每个 ii 有以下要求:
2. If the ii th general is loyal, then the value that he sends must be used by every loyal general as the value of v(i)v(i). 2. 如果第 ii 个将军是忠诚的,那么他发送的值必须被每个忠诚的将军用作 v(i)v(i) 的值。
We can rewrite condition 1 as the condition that for every ii (whether or not the ii th general is loyal), 我们可以将条件 1 重写为:对于每个 ii (无论第 ii 个将军是否忠诚), 1^(')1^{\prime}. Any two loyal generals use the same value of v(i)v(i). 1^(')1^{\prime} 。任何两个忠诚的将军使用相同的 v(i)v(i) 值。
Conditions 1^(')1^{\prime} and 2 are both conditions on the single value sent by the ii th general. We can therefore restrict our consideration to the problem of how a single general sends his value to the others. We phrase this in terms of a commanding general sending an order to his lieutenants, obtaining the following problem. 条件 1^(')1^{\prime} 和 2 都是对第 ii 个将军发送的单一值的条件。因此,我们可以将考虑范围限制在单个将军如何将他的值发送给其他人这一问题上。我们将其表述为一位指挥将军向他的中尉发送命令,从而得到以下问题。
Byzantine Generals Problem. A commanding general must send an order to his n-1n-1 lieutenant generals such that 拜占庭将军问题。一位指挥将军必须向他的 n-1n-1 位中尉将军发送命令,使得
IC1. All loyal lieutenants obey the same order. IC1。所有忠诚的中尉都服从相同的命令。
IC2. If the commanding general is loyal, then every loyal lieutenant obeys the order he sends. IC2。如果指挥将军是忠诚的,那么每个忠诚的中尉都会服从他发送的命令。
Conditions IC1 and IC2 are called the interactive consistency conditions. Note that if the commander is loyal, then IC1 follows from IC2. However, the commander need not be loyal. 条件 IC1 和 IC2 被称为交互一致性条件。注意,如果指挥官是忠诚的,那么 IC1 可以从 IC2 推导出来。然而,指挥官不一定是忠诚的。
To solve our original problem, the ii th general sends his value of v(i)v(i) by using a solution to the Byzantine Generals Problem to send the order “use v(i)v(i) as my value”, with the other generals acting as the lieutenants. 为了解决我们最初的问题,第 ii 位将军通过使用拜占庭将军问题的解决方案发送他的值 v(i)v(i) ,并发送命令“使用 v(i)v(i) 作为我的值”,其他将军则充当中尉。
2. IMPOSSIBILITY RESULTS 2. 不可能性结果
The Byzantine Generals Problem seems deceptively simple. Its difficulty is indicated by the surprising fact that if the generals can send only oral messages, then no solution will work unless more than two-thirds of the generals are loyal. In particular, with only three generals, no solution can work in the presence of a single traitor. An oral message is one whose contents are completely under the control of the sender, so a traitorous sender can transmit any possible message. Such a message corresponds to the type of message that computers normally send to one another. In Section 4 we consider signed, written messages, for which this is not true. 拜占庭将军问题看似简单。其难度体现在一个令人惊讶的事实:如果将军们只能发送口头消息,那么除非超过三分之二的将军是忠诚的,否则没有解决方案可行。特别是,当只有三位将军时,在存在一个叛徒的情况下,没有解决方案能够奏效。口头消息是指其内容完全由发送者控制的消息,因此叛徒发送者可以传递任何可能的消息。这种消息类型对应于计算机通常相互发送的消息类型。在第 4 节中,我们将考虑签名的书面消息,对于这种消息情况则不同。
We now show that with oral messages no solution for three generals can handle a single traitor. For simplicity, we consider the case in which the only possible decisions are “attack” or “retreat”. Let us first examine the scenario pictured in Figure 1 in which the commander is loyal and sends an “attack” order, but Lieutenant 2 is a traitor and reports to Lieutenant 1 that he received a “retreat” order. For IC2 to be satisfied, Lieutenant 1 must obey the order to attack. 我们现在证明,对于口头信息,三将军问题中没有解决方案能够应对单个叛徒。为简便起见,我们考虑唯一可能的决策是“进攻”或“撤退”的情况。首先让我们考察图 1 所示的情景,其中指挥官是忠诚的并下达了“进攻”命令,但中尉 2 是叛徒,他向中尉 1 报告说他收到了“撤退”命令。为了满足 IC2,中尉 1 必须服从进攻命令。
Now consider another scenario, shown in Figure 2, in which the commander is a traitor and sends an “attack” order to Lieutenant 1 and a “retreat” order to Lieutenant 2. Lieutenant 1 does not know who the traitor is, and he cannot tell what message the commander actually sent to Lieutenant 2 . Hence, the scenarios in these two pictures appear exactly the same to Lieutenant 1 . If the traitor lies consistently, then there is no way for Lieutenant 1 to distinguish between these two situations, so he must obey the “attack” order in both of them. Hence, whenever Lieutenant 1 receives an “attack” order from the commander, he must obey it. 现在考虑另一个情景,如图 2 所示,其中指挥官是叛徒,向中尉 1 下达“进攻”命令,向中尉 2 下达“撤退”命令。中尉 1 不知道谁是叛徒,也无法判断指挥官实际上向中尉 2 发送了什么信息。因此,这两幅图中的情景对中尉 1 来说看起来完全相同。如果叛徒始终说谎,那么中尉 1 无法区分这两种情况,因此他必须在两种情况下都服从“进攻”命令。因此,每当中尉 1 收到指挥官的“进攻”命令时,他必须服从。
Fig. 1. Lieutenant 2 a traitor. 图 1。中尉 2 是叛徒。
Fig. 2. The commander a traitor. 图 2. 指挥官是叛徒。
However, a similar argument shows that if Lieutenant 2 receives a “retreat” order from the commander then he must obey it even if Lieutenant 1 tells him that the commander said “attack”. Therefore, in the scenario of Figure 2, Lieutenant 2 must obey the “retreat” order while Lieutenant 1 obeys the “attack” order, thereby violating condition IC1. Hence, no solution exists for three generals that works in the presence of a single traitor. 然而,类似的论证表明,如果中尉 2 接到指挥官的“撤退”命令,那么即使中尉 1 告诉他指挥官说的是“进攻”,他也必须服从。因此,在图 2 的情景中,中尉 2 必须服从“撤退”命令,而中尉 1 服从“进攻”命令,从而违反了条件 IC1。因此,三位将军中存在一个叛徒时,不存在可行的解决方案。
This argument may appear convincing, but we strongly advise the reader to be very suspicious of such nonrigorous reasoning. Although this result is indeed correct, we have seen equally plausible “proofs” of invalid results. We know of no area in computer science or mathematics in which informal reasoning is more likely to lead to errors than in the study of this type of algorithm. For a rigorous proof of the impossibility of a three-general solution that can handle a single traitor, we refer the reader to [3]. 这个论点看似令人信服,但我们强烈建议读者对这种非严格的推理保持高度怀疑。尽管这个结果确实正确,但我们也见过同样看似合理的“证明”却得出了错误的结论。在计算机科学或数学的任何领域中,我们都未见过比研究此类算法更容易因非正式推理而导致错误的情况。关于三将军问题中无法解决单个叛徒的严格证明,读者可参见文献[3]。
Using this result, we can show that no solution with fewer than 3m+13 m+1 generals can cope with mm traitors. ^(1){ }^{1} The proof is by contradiction-we assume such a 利用这一结果,我们可以证明少于 3m+13 m+1 位将军的方案无法应对 mm 个叛徒。 ^(1){ }^{1} 证明方法是反证法——我们假设存在这样的方案。
solution for a group of 3m3 m or fewer and use it to construct a three-general solution to the Byzantine Generals Problem that works with one traitor, which we know to be impossible. To avoid confusion between the two algorithms, we call the generals of the assumed solution Albanian generals, and those of the constructed solution Byzantine generals. Thus, starting from an algorithm that allows 3m3 m or fewer Albanian generals to cope with mm traitors, we construct a solution that allows three Byzantine generals to handle a single traitor. 针对一组不超过 3m3 m 个成员的解决方案,并利用它构造一个适用于拜占庭将军问题的三将军解决方案,该方案能够应对一个叛徒,而我们知道这是不可能的。为了避免混淆这两种算法,我们将假设解决方案中的将军称为阿尔巴尼亚将军,而构造的解决方案中的将军称为拜占庭将军。因此,从一个允许不超过 3m3 m 个阿尔巴尼亚将军应对 mm 个叛徒的算法出发,我们构造了一个允许三名拜占庭将军处理一个叛徒的解决方案。
The three-general solution is obtained by having each of the Byzantine generals simulate approximately one-third of the Albanian generals, so that each Byzantine general is simulating at most mm Albanian generals. The Byzantine commander simulates the Albanian commander plus at most m-1m-1 Albanian lieutenants, and each of the two Byzantine lieutenants simulates at most mm Albanian lieutenants. Since only one Byzantine general can be a traitor, and he simulates at most mm Albanians, at most mm of the Albanian generals are traitors. Hence, the assumed solution guarantees that IC1 and IC2 hold for the Albanian generals. By IC1, all the Albanian lieutenants being simulated by a loyal Byzantine lieutenant obey the same order, which is the order he is to obey. It is easy to check that conditions IC1 and IC2 of the Albanian generals solution imply the corresponding conditions for the Byzantine generals, so we have constructed the required impossible solution. 三将军解法是通过让每个拜占庭将军模拟大约三分之一的阿尔巴尼亚将军,从而使每个拜占庭将军最多模拟 mm 个阿尔巴尼亚将军。拜占庭指挥官模拟阿尔巴尼亚指挥官加上最多 m-1m-1 个阿尔巴尼亚中尉,而两个拜占庭中尉中的每一个最多模拟 mm 个阿尔巴尼亚中尉。由于只有一个拜占庭将军可能是叛徒,且他最多模拟 mm 个阿尔巴尼亚人,因此最多有 mm 个阿尔巴尼亚将军是叛徒。因此,假设的解法保证了阿尔巴尼亚将军满足 IC1 和 IC2 条件。根据 IC1,所有被忠诚的拜占庭中尉模拟的阿尔巴尼亚中尉都服从相同的命令,即他应服从的命令。很容易验证,阿尔巴尼亚将军解法的 IC1 和 IC2 条件蕴含了拜占庭将军相应的条件,因此我们构造了所需的不可能解法。
One might think that the difficulty in solving the Byzantine Generals Problem stems from the requirement of reaching exact agreement. We now demonstrate that this is not the case by showing that reaching approximate agreement is just as hard as reaching exact agreement. Let us assume that instead of trying to agree on a precise battle plan, the generals must agree only upon an approximate time of attack. More precisely, we assume that the commander orders the time of the attack, and we require the following two conditions to hold: 人们可能认为解决拜占庭将军问题的难点在于必须达成精确一致。我们现在证明情况并非如此,通过展示达成近似一致与达成精确一致同样困难。假设将军们不必就精确的作战计划达成一致,而只需就攻击的近似时间达成一致。更准确地说,我们假设指挥官下达攻击时间的命令,并要求满足以下两个条件:
IC1’. All loyal lieutenants attack within 10 minutes of one another. IC1’。所有忠诚的中尉的攻击时间相互之间不超过 10 分钟。
IC2’. If the commanding general is loyal, then every loyal lieutenant attacks within 10 minutes of the time given in the commander’s order. IC2’。如果指挥将军是忠诚的,那么每个忠诚的中尉的攻击时间与指挥官命令中给出的时间相差不超过 10 分钟。
(We assume that the orders are given and processed the day before the attack and that the time at which an order is received is irrelevant-only the attack time given in the order matters.) (我们假设命令在攻击前一天下达并处理,接收命令的时间无关紧要——只有命令中给出的攻击时间才重要。)
It follows from IC2’ that if the commander is loyal, then a loyal lieutenant will obtain the correct order in step (1), so IC2 is satisfied. If the commander is loyal, then IC1 follows from IC2, so we need only prove IC1 under the assumption that the commander is a traitor. Since there is at most one traitor, this means that both lieutenants are loyal. It follows from IC1’ that if one lieutenant decides to attack in step (1), then the other cannot decide to retreat in step (1). Hence, either they will both come to the same decision in step (1) or at least one of them will defer his decision until step (2). In this case, it is easy to see that they both arrive at the same decision, so IC1 is satisfied. We have therefore constructed a three-general solution to the Byzantine Generals Problem that handles one traitor, which is impossible. Hence, we cannot have a three-general algorithm that maintains IC1’ and IC2’ in the presence of a traitor. 由 IC2’可知,如果指挥官忠诚,那么忠诚的中尉将在步骤(1)获得正确的命令,因此满足 IC2。如果指挥官忠诚,则 IC1 由 IC2 推导而来,因此我们只需在假设指挥官是叛徒的情况下证明 IC1。由于最多只有一个叛徒,这意味着两个中尉都是忠诚的。由 IC1’可知,如果一个中尉在步骤(1)决定进攻,那么另一个中尉在步骤(1)不能决定撤退。因此,他们要么在步骤(1)达成相同的决定,要么至少有一个中尉将决定推迟到步骤(2)。在这种情况下,很容易看出他们都会达成相同的决定,因此满足 IC1。因此,我们构造了一个解决拜占庭将军问题的三将军方案,该方案能处理一个叛徒,这是不可能的。因此,在存在叛徒的情况下,我们不可能有一个三将军算法同时保持 IC1’和 IC2’。
The method of having one general simulate mm others can now be used to prove that no solution with fewer than 3m+13 m+1 generals can cope with mm traitors. The proof is similar to the one for the original Byzantine Generals Problem and is left to the reader. 现在可以使用让一个将军模拟 mm 个其他将军的方法来证明,少于 3m+13 m+1 个将军的解决方案无法应对 mm 个叛徒。证明类似于最初拜占庭将军问题的证明,留给读者完成。
3. A SOLUTION WITH ORAL MESSAGES 3. 使用口头消息的解决方案
We showed above that for a solution to the Byzantine Generals Problem using oral messages to cope with mm traitors, there must be at least 3m+13 m+1 generals. We now give a solution that works for 3m+13 m+1 or more generals. However, we first specify exactly what we mean by “oral messages”. Each general is supposed to execute some algorithm that involves sending messages to the other generals, and we assume that a loyal general correctly executes his algorithm. The definition of an oral message is embodied in the following assumptions which we make for the generals’ message system: 我们在上文中已经证明,对于使用口头消息来应对 mm 个叛徒的拜占庭将军问题的解决方案,必须至少有 3m+13 m+1 个将军。现在我们给出一个适用于 3m+13 m+1 个或更多将军的解决方案。然而,我们首先明确“口头消息”的具体含义。每个将军应执行某个算法,该算法涉及向其他将军发送消息,我们假设忠诚的将军会正确执行其算法。口头消息的定义体现在我们对将军消息系统所做的以下假设中:
A1. Every message that is sent is delivered correctly. A1. 发送的每条消息都能被正确传达。
A2. The receiver of a message knows who sent it. A2. 消息的接收者知道是谁发送的。
A 3 . The absence of a message can be detected. A3. 可以检测到消息的缺失。
Assumptions A1 and A2 prevent a traitor from interfering with the communication between two other generals, since by A1 he cannot interfere with the messages they do send, and by A2 he cannot confuse their intercourse by introducing spurious messages. Assumption A3 will foil a traitor who tries to prevent a decision by simply not sending messages. The practical implementation of these assumptions is discussed in Section 6. 假设 A1 和 A2 防止叛徒干扰两个其他将军之间的通信,因为根据 A1,他不能干扰他们发送的消息,根据 A2,他不能通过引入伪造消息来混淆他们的交流。假设 A3 将挫败试图通过不发送消息来阻止决策的叛徒。这些假设的实际实现将在第 6 节中讨论。
The algorithms in this section and in the following one require that each general be able to send messages directly to every other general. In Section 5, we describe algorithms which do not have this requirement. 本节及下一节中的算法要求每个将军能够直接向其他每个将军发送消息。在第 5 节中,我们描述了不具备此要求的算法。
A traitorous commander may decide not to send any order. Since the lieutenants must obey some order, they need some default order to obey in this case. We let RETREAT be this default order. 一个叛变的指挥官可能决定不发送任何命令。由于中尉必须服从某个命令,因此在这种情况下他们需要一个默认命令。我们让“撤退”成为这个默认命令。
We inductively define the Oral\operatorname{Oral} Message algorithms OM(m)\mathrm{OM}(m), for all nonnegative integers mm, by which a commander sends an order to n-1n-1 lieutenants. We show that OM(m)\mathrm{OM}(m) solves the Byzantine Generals Problem for 3m+13 m+1 or more generals in the presence of at most mm traitors. We find it more convenient to describe this algorithm in terms of the lieutenants “obtaining a value” rather than “obeying an order”. 我们通过归纳法定义了 Oral\operatorname{Oral} 消息算法 OM(m)\mathrm{OM}(m) ,适用于所有非负整数 mm ,其中指挥官向 n-1n-1 名中尉发送命令。我们证明了 OM(m)\mathrm{OM}(m) 在最多有 mm 名叛徒的情况下,能够解决拜占庭将军问题,适用于 3m+13 m+1 名或更多将军。我们发现用中尉“获得一个值”而不是“服从命令”来描述该算法更为方便。
The algorithm assumes a function majority with the property that if a majority of the values v_(i)v_{i} equal vv, then majority (v_(1),dots,v_(n-1))\left(v_{1}, \ldots, v_{n-1}\right) equals vv. (Actually, it assumes a sequence of such functions-one for each nn.) There are two natural choices for the value of majority (v_(1),dots,v_(n-1))\left(v_{1}, \ldots, v_{n-1}\right) : 该算法假设存在一个多数函数 majority,其性质是如果大多数值 v_(i)v_{i} 等于 vv ,则 majority (v_(1),dots,v_(n-1))\left(v_{1}, \ldots, v_{n-1}\right) 等于 vv 。(实际上,它假设存在一系列这样的函数——每个 nn 对应一个。)对于 majority (v_(1),dots,v_(n-1))\left(v_{1}, \ldots, v_{n-1}\right) 的值,有两个自然的选择:
The majority value among the v_(i)v_{i} if it exists, otherwise the value RETREAT; 如果存在,则为 v_(i)v_{i} 中的多数值;否则为值“撤退”;
The median of the v_(i)v_{i}, assuming that they come from an ordered set. v_(i)v_{i} 的中位数,假设它们来自一个有序集合。
The following algorithm requires only the aforementioned property of majority. 以下算法仅需要上述多数的性质。
Algorithm OM(0). 算法 OM(0)。
(1) The commander sends his value to every lieutenant. (1)指挥官将他的值发送给每个中尉。
(2) Each lieutenant uses the value he receives from the commander, or uses the value RETREAT if he receives no value. (2)每个中尉使用他从指挥官那里收到的值,或者如果没有收到值,则使用值“撤退”。
Algorithm OM(m),m > 0O M(m), m>0. 算法 OM(m),m > 0O M(m), m>0 .
(1) The commander sends his value to every lieutenant. (1)指挥官将他的值发送给每个中尉。
(2) For each ii, let v_(i)v_{i} be the value Lieutenant ii receives from the commander, or else be RETREAT if he rezeives no value. Lieutenant ii acts as the commander in Algorithm OM(m-1)\mathrm{OM}(m-1) to send the value v_(i)v_{i} to each of the n-2n-2 other lieutenants. (2)对于每个 ii ,令 v_(i)v_{i} 为中尉 ii 从指挥官那里收到的值,或者如果他没有收到值,则为“撤退”。中尉 ii 在算法 OM(m-1)\mathrm{OM}(m-1) 中充当指挥官,将值 v_(i)v_{i} 发送给其他 n-2n-2 个中尉。
(3) For each ii, and each j!=ij \neq i, let v_(j)v_{j} be the value Lieutenant ii received from Lieutenant jj in step (2) (using Algorithm OM(m-1)\mathrm{OM}(m-1) ), or else RETREAT if he received no such value. Lieutenant ii uses the value majority (v_(1),dots,v_(n-1))\left(v_{1}, \ldots, v_{n-1}\right). (3) 对于每个 ii 和每个 j!=ij \neq i ,令 v_(j)v_{j} 为中尉 ii 在步骤(2)中从中尉 jj 处获得的值(使用算法 OM(m-1)\mathrm{OM}(m-1) ),如果未收到此类值,则为撤退(RETREAT)。中尉 ii 使用值多数 (v_(1),dots,v_(n-1))\left(v_{1}, \ldots, v_{n-1}\right) 。
To understand how this algorithm works, we consider the case m=1,n=4m=1, n=4. Figure 3 illustrates the messages received by Lieutenant 2 when the commander sends the value vv and Lieutenant 3 is a traitor. In the first step of OM(1)\mathrm{OM}(1), the commander sends vv to all three lieutenants. In the second step, Lieutenant 1 sends the value vv to Lieutenant 2, using the trivial algorithm OM(0)\mathrm{OM}(0). Also in the second step, the traitorous Lieutenant 3 sends Lieutenant 2 some other value xx. In step 3, Lieutenant 2 then has v_(1)=v_(2)=vv_{1}=v_{2}=v and v_(3)=xv_{3}=x, so he obtains the correct value v=v= majority (v,v,x)(v, v, x). 为了理解该算法的工作原理,我们考虑情况 m=1,n=4m=1, n=4 。图 3 展示了当指挥官发送值 vv 且中尉 3 为叛徒时,中尉 2 收到的消息。在 OM(1)\mathrm{OM}(1) 的第一步,指挥官向所有三名中尉发送 vv 。在第二步,中尉 1 使用简单算法 OM(0)\mathrm{OM}(0) 向中尉 2 发送值 vv 。同样在第二步,叛徒中尉 3 向中尉 2 发送了另一个值 xx 。在第三步,中尉 2 拥有 v_(1)=v_(2)=vv_{1}=v_{2}=v 和 v_(3)=xv_{3}=x ,因此他获得了正确的值多数 v=v= 。
Next, we see what happens if the commander is a traitor. Figure 4 shows the values received by the lieutenants if a traitorous commander sends three arbitrary values x,yx, y, and zz to the three lieutenants. Each lieutenant obtains v_(1)=x,v_(2)=yv_{1}=x, v_{2}=y, and v_(3)=zv_{3}=z, so they all obtain the same value majority (x,y,z)(x, y, z) in step (3), regardless of whether or not any of the three values x,yx, y, and zz are equal. 接下来,我们来看如果指挥官是叛徒会发生什么。图 4 显示了如果叛徒指挥官向三名中尉发送三个任意值 x,yx, y 、 zz 和 v_(1)=x,v_(2)=yv_{1}=x, v_{2}=y ,中尉们收到的值。每个中尉获得 v_(3)=zv_{3}=z 和 (x,y,z)(x, y, z) ,因此他们在步骤(3)中都获得相同的值多数 x,yx, y ,无论这三个值 zz 、{7} 和 {8} 是否相等。
The recursive algorithm OM(m)\mathrm{OM}(m) invokes n-1n-1 separate executions of the algorithm OM(m-1)\mathrm{OM}(m-1), each of which invokes n-2n-2 executions of OM(m-2)\mathrm{OM}(m-2), etc. This means that, for m > 1m>1, a lieutenant sends many separate messages to each other lieutenant. There must be some way to distinguish among these different messages. The reader can verify that all ambiguity is removed if each lieutenant ii prefixes the number ii to the value v_(i)v_{i} that he sends in step (2). As the recursion “unfolds,” the algorithm OM(m-k)\mathrm{OM}(m-k) will be called (n-1)cdots(n-k)(n-1) \cdots(n-k) times to send a value prefixed by a sequence of kk lieutenants’ numbers. 递归算法 OM(m)\mathrm{OM}(m) 调用 n-1n-1 次独立的算法 OM(m-1)\mathrm{OM}(m-1) 执行,每次执行又调用 n-2n-2 次 OM(m-2)\mathrm{OM}(m-2) ,依此类推。这意味着,对于 m > 1m>1 ,一名中尉会向每个其他中尉发送许多独立的消息。必须有某种方法来区分这些不同的消息。读者可以验证,如果每个中尉 ii 在他在步骤(2)中发送的值 v_(i)v_{i} 前加上数字 ii 作为前缀,所有歧义都将被消除。随着递归“展开”,算法 OM(m-k)\mathrm{OM}(m-k) 将被调用 (n-1)cdots(n-k)(n-1) \cdots(n-k) 次,以发送一个由 kk 个中尉编号序列作为前缀的值。
Fig. 4. Algorithm OM(1)\mathrm{OM}(1); the commander a traitor. 图 4. 算法 OM(1)\mathrm{OM}(1) ;指挥官是叛徒。
Fig. 3. Algorithm OM(1); Lieutenant 3 a traitor. 图 3. 算法 OM(1);中尉 3 是叛徒。
To prove the correctness of the algorithm OM(m)\mathrm{OM}(m) for arbitrary mm, we first prove the following lemma. 为了证明算法 OM(m)\mathrm{OM}(m) 对任意 mm 的正确性,我们首先证明以下引理。
Lemma 1. For any mm and kk, Algorithm OM(m)O M(m) satisfies IC2 if there are more than 2k+m2 k+m generals and at most kk traitors. 引理 1。对于任意的 mm 和 kk ,如果将军人数超过 2k+m2 k+m 且叛徒人数不超过 kk ,算法 OM(m)O M(m) 满足 IC2。
Proof. The proof is by induction on mm. IC2 only specifies what must happen if the commander is loyal. Using A1, it is easy to see that the trivial algorithm OM(0)\mathrm{OM}(0) works if the commander is loyal, so the lemma is true for m=0m=0. We now assume it is true for m-1,m > 0m-1, m>0, and prove it for mm. 证明。证明通过对 mm 进行归纳。IC2 仅规定了指挥官忠诚时必须发生的情况。利用 A1,可以很容易看出当指挥官忠诚时,简单算法 OM(0)\mathrm{OM}(0) 是有效的,因此该引理对 m=0m=0 成立。现在假设其对 m-1,m > 0m-1, m>0 成立,并证明其对 mm 也成立。
In step (1), the loyal commander sends a value vv to all n-1n-1 lieutenants. In step (2), each loyal lieutenant applies OM(m-1)\mathrm{OM}(m-1) with n-1n-1 generals. Since by hypothesis n > 2k+mn>2 k+m, we have n-1 > 2k+(m-1)n-1>2 k+(m-1), so we can apply the induction hypothesis to conclude that every loyal lieutenant gets v_(j)=vv_{j}=v for each loyal Lieutenant jj. Since there are at most kk traitors, and n-1 > 2k+(m-1)n-1>2 k+(m-1)>= 2k\geq 2 k, a majority of the n-1n-1 lieutenants are loyal. Hence, each loyal lieutenant has v_(i)=vv_{i}=v for a majority of the n-1n-1 values ii, so he obtains majority (v_(1),dots:}\left(v_{1}, \ldots\right., v_(n-1)v_{n-1} ) =v=v in step (3), proving IC2. 在步骤(1)中,忠诚的指挥官向所有 n-1n-1 名中尉发送值 vv 。在步骤(2)中,每个忠诚的中尉对 n-1n-1 名将军应用 OM(m-1)\mathrm{OM}(m-1) 。由于假设 n > 2k+mn>2 k+m ,我们有 n-1 > 2k+(m-1)n-1>2 k+(m-1) ,因此可以应用归纳假设得出每个忠诚的中尉对每个忠诚的中尉 jj 都获得了 v_(j)=vv_{j}=v 。由于叛徒最多有 kk 名,且 n-1 > 2k+(m-1)n-1>2 k+(m-1)>= 2k\geq 2 k ,大多数 n-1n-1 名中尉是忠诚的。因此,每个忠诚的中尉对大多数 n-1n-1 个值 ii 拥有 v_(i)=vv_{i}=v ,所以他在步骤(3)中获得了多数 (v_(1),dots:}\left(v_{1}, \ldots\right. , v_(n-1)v_{n-1} ) =v=v ,从而证明了 IC2。
The following theorem asserts that Algorithm OM(m)\mathrm{OM}(m) solves the Byzantine Generals Problem. 下述定理断言算法 OM(m)\mathrm{OM}(m) 解决了拜占庭将军问题。
Theorem 1. For any m, Algorithm OM(m) satisfies conditions IC1 and IC2 if there are more than 3m3 m generals and at most mm traitors. 定理 1. 对于任意的 m,如果将军人数超过 3m3 m 且叛徒人数不超过 mm ,算法 OM(m)满足条件 IC1 和 IC2。
Proof. The proof is by induction on mm. If there are no traitors, then it is easy to see that OM(0)\mathrm{OM}(0) satisfies IC1 and IC2. We therefore assume that the theorem is true for OM(m-1)\mathrm{OM}(m-1) and prove it for OM(m),m > 0\mathrm{OM}(m), m>0. 证明. 证明通过对 mm 进行归纳完成。如果没有叛徒,那么很容易看出 OM(0)\mathrm{OM}(0) 满足 IC1 和 IC2。因此,我们假设定理对 OM(m-1)\mathrm{OM}(m-1) 成立,并证明其对 OM(m),m > 0\mathrm{OM}(m), m>0 也成立。
We first consider the case in which the commander is loyal. By taking kk equal to mm in Lemma 1, we see that OM(m)\mathrm{OM}(m) satisfies IC2. IC1 follows from IC2 if the commander is loyal, so we need only verify IC1 in the case that the commander is a traitor. 我们首先考虑指挥官忠诚的情况。通过在引理 1 中令 kk 等于 mm ,我们看到 OM(m)\mathrm{OM}(m) 满足 IC2。如果指挥官忠诚,IC1 由 IC2 推导而来,因此我们只需验证指挥官为叛徒时的 IC1。
There are at most mm traitors, and the commander is one of them, so at most m-1m-1 of the lieutenants are traitors. Since there are more than 3m3 m generals, there are more than 3m-13 m-1 lieutenants, and 3m-1 > 3(m-1)3 m-1>3(m-1). We may therefore apply the induction hypothesis to conclude that OM(m-1)\mathrm{OM}(m-1) satisfies conditions IC1 and IC2. Hence, for each jj, any two loyal lieutenants get the same value for v_(j)v_{j} in step (3). (This follows from IC2 if one of the two lieutenants is Lieutenant jj, and from IC1 otherwise.) Hence, any two loyal lieutenants get the same vector of values v_(1),dots,v_(n-1)v_{1}, \ldots, v_{n-1}, and therefore obtain the same value majority (v_(1),dots,v_(n-1))\left(v_{1}, \ldots, v_{n-1}\right) in step (3), proving IC1. 至多有 mm 个叛徒,指挥官是其中之一,因此至多有 m-1m-1 个中尉是叛徒。由于将军人数超过 3m3 m ,中尉人数也超过 3m-13 m-1 ,并且 3m-1 > 3(m-1)3 m-1>3(m-1) 。因此,我们可以应用归纳假设得出 OM(m-1)\mathrm{OM}(m-1) 满足条件 IC1 和 IC2。因此,对于每个 jj ,任何两个忠诚的中尉在步骤(3)中对 v_(j)v_{j} 的值都相同。(如果两个中尉中的一个是中尉 jj ,则根据 IC2 得出此结论,否则根据 IC1 得出。)因此,任何两个忠诚的中尉获得相同的值向量 v_(1),dots,v_(n-1)v_{1}, \ldots, v_{n-1} ,并且因此在步骤(3)中获得相同的多数值 (v_(1),dots,v_(n-1))\left(v_{1}, \ldots, v_{n-1}\right) ,从而证明了 IC1。
4. A SOLUTION WITH SIGNED MESSAGES 4. 带签名消息的解决方案
As we saw from the scenario of Figures 1 and 2, it is the traitors’ ability to lie that makes the Byzantine Generals Problem so difficult. The problem becomes easier to solve if we can restrict that ability. One way to do this is to allow the generals to send unforgeable signed messages. More precisely, we add to A1-A3 the 正如我们从图 1 和图 2 的场景中看到的,正是叛徒的撒谎能力使拜占庭将军问题如此困难。如果我们能够限制这种能力,问题就变得更容易解决。实现这一点的一种方法是允许将军发送不可伪造的签名消息。更准确地说,我们在 A1-A3 的基础上增加了
following assumption: 以下假设:
A4 (a) A loyal general’s signature cannot be forged, and any alteration of the contents of his signed messages can be detected. A4 (a) 忠诚将军的签名无法被伪造,且其签署信息内容的任何篡改都能被检测出来。
(b) Anyone can verify the authenticity of a general’s signature. (b) 任何人都可以验证将军签名的真实性。
Note that we make no assumptions about a traitorous general’s signature. In particular, we allow his signature to be forged by another traitor, thereby permitting collusion among the traitors. 注意,我们对叛变将军的签名不做任何假设。特别是,我们允许他的签名被另一名叛徒伪造,从而允许叛徒之间的勾结。
Now that we have introduced signed messages, our previous argument that four generals are required to cope with one traitor no longer holds. In fact, a three-general solution does exist. We now give an algorithm that copes with mm traitors for any number of generals. (The problem is vacuous if there are fewer than m+2m+2 generals.) 既然我们已经引入了签名消息,之前关于需要四个将军来应对一个叛徒的论点不再成立。事实上,存在一个三将军的解决方案。我们现在给出一个算法,可以应对任意数量将军中的 mm 个叛徒。(如果将军少于 m+2m+2 个,则问题无意义。)
In our algorithm, the commander sends a signed order to each of his lieutenants. Each lieutenant then adds his signature to that order and sends it to the other lieutenants, who add their signatures and send it to others, and so on. This means that a lieutenant must effectively receive one signed message, make several copies of it, and sign and send those copies. It does not matter how these copies are obtained; a single message might be photocopied, or else each message might consist of a stack of identical messages which are signed and distributed as required.
Our algorithm assumes a function choice which is applied to a set of orders to pbtain a single one. The only requirements we make for this function are
If the set VV consists of the single element vv, then choice(V)=v\operatorname{choice}(V)=v.
choice (O/)=(\varnothing)= RETREAT, where O/\varnothing is the empty set.
Note that one possible definition is to let choice (V)(V) be the median element of VV-assuming that there is an ordering of the elements.
In the following algorithm, we let x:ix: i denote the value xx signed by General ii. Thus, v:j:iv: j: i denotes the value vv signed by jj, and then that value v:jv: j signed by ii. We let General 0 be the commander. In this algorithm, each lieutenant ii maintains a set V_(i)V_{i}, containng the set of properly signed orders he has received so far. (If the commander is loyal, then this set should never contain more than a single element.) Do not confuse V_(i)V_{i}, the set of orders he has received, with the set of messages that he has received. There may be many different messages with the same order.
Algorithm SM(m).
Initially V_(i)=O/V_{i}=\varnothing.
(1) The commander signs and sends his value to every lieutenant.
(2) For each ii :
(A) If Lieutenant ii receives a message of the form v:0v: 0 from the commander and he has not yet received any order, then
(i) he lets V_(i)V_{i} equal {v}\{v\};
(ii) he sends the message v:0:iv: 0: i to every other lieutenant.
Fig. 5. Algorithm SM(1); the commander a traitor.
(B) If Lieutenant ii receives a message of the form v:0:j_(1):cdots:j_(k)v: 0: j_{1}: \cdots: j_{k} and vv is not in the set V_(i)V_{i}, then
(i) he adds vv to V_(i)V_{i};
(ii) if k < mk<m, then he sends the message v:0:j_(1):cdots:j_(k):iv: 0: j_{1}: \cdots: j_{k}: i to every lieutenant other than j_(1),dots,j_(k)j_{1}, \ldots, j_{k}.
(3) For each ii : When Lieutenant ii will receive no more messages, he obeys the order choice ( V_(i)V_{i} ).
Note that in step (2), Lieutenant ii ignores any message containing an order vv that is already in the set V_(i)V_{i}.
We have not specified how a lieutenant determines in step (3) that he will receive no more messages. By induction on kk, one easily shows that for each sequence of lieutenants j_(1),dots,j_(k)j_{1}, \ldots, j_{k} with k <= mk \leq m, a lieutenant can receive at most one message of the form v:0:j_(1):cdots:j_(k)v: 0: j_{1}: \cdots: j_{k} in step (2). If we require that Lieutenant j_(k)j_{k} either send such a message or else send a message reporting that he will not send such a message, then it is easy to decide when all messages have been received. (By assumption A3, a lieutenant can determine if a traitorous lieutenant j_(k)j_{k} sends neither of those two messages.) Alternatively, time-out can be used to determine when no more messages will arrive. The use of time-out is discussed in Section 6.
Note that in step (2), Lieutenant ii ignores any messages that do not have the proper form of a value followed by a string of signatures. If packets of identical messages are used to avoid having to copy messages, this means that he throws away any packet that does not consist of a sufficient number of identical, properly signed messages. (There should be (n-k-2)(n-k-3)cdots(n-m-2)(n-k-2)(n-k-3) \cdots(n-m-2) copies of the message if it has been signed by kk lieutenants.)
Figure 5 illustrates Algorithm SM(1) for the case of three generals when the commander is a traitor. The commander sends an “attack” order to one lieutenant and a “retreat” order to the other. Both lieutenants receive the two orders in step (2), so after step (2) V_(1)=V_(2)=V_{1}=V_{2}= {“attack”, “retreat”}, and they both obey the Figure 2, the lieutenants know the commander is a traitor because his signature
appears on two different orders, and A4 states that only he could have generated those signatures.
In Algorithm SM (m)(m), a lieutenant signs his name to acknowledge his receipt of an order. If he is the mm th lieutenant to add his signature to the order, then that signature is not relayed to anyone else by its recipient, so it is superfluous. (More precisely, assumption A2 makes it unnecessary.) In particular, the lieutenants need not sign their messages in SM(1).
We now prove the correctness of our algorithm.
Theorem 2. For any m, Algorithm SM(m)S M(m) solves the Byzantine Generals Problem if there are at most mm traitors.
Proof. We first prove IC2. If the commander is loyal, then he sends his signed order v:0v: 0 to every lieutenant in step (1). Every loyal lieutenant will therefore receive the order vv in step (2)(A). Moreover, since no traitorous lieutenant can forge any other message of the form v^('):0v^{\prime}: 0, a loyal lieutenant can receive no additional order in step (2)(B). Hence, for each loyal Lieutenant ii, the set V_(i)V_{i} obtained in step (2) consists of the single order vv, which he will obey in step (3) by property 1 of the choice function. This proves IC2.
Since IC1 follows from IC2 if the commander is loyal, to prove IC1 we need only consider the case in which the commander is a traitor. Two loyal lieutenants ii and jj obey the same order in step (3) if the sets of orders V_(i)V_{i} and V_(j)V_{j} that they receive in step (2) are the same. Therefore, to prove IC1 it suffices to prove that, if ii puts an order vv into V_(i)V_{i} in step (2), then jj must put the same order vv into V_(j)V_{j} in step (2). To do this, we must show that jj receives a properly signed message containing that order. If ii receives the order vv in step (2)(A), then he sends it to jj in step (2)(A)(ii); so jj receives it (by A1). If ii adds the order to V_(i)V_{i} in step (2)(B), then he must receive a first message of the form v:0:j_(1):dots:j_(k)v: 0: j_{1}: \ldots: j_{k}. If jj is one of the j_(r)j_{r}, then by A4 he must already have received the order vv. If not, we consider two cases:
k < mk<m. In this case, ii sends the message v:0:j_(1):cdots:j_(k):iv: 0: j_{1}: \cdots: j_{k}: i to jj; so jj must receive the order vv.
k=mk=m. Since the commander is a traitor, at most m-1m-1 of the lieutenants are traitors. Hence, at least one of the lieutenants j_(1),dots,j_(m)j_{1}, \ldots, j_{m} is loyal. This loyal lieutenant must have sent jj the value vv when he first received it, so jj must therefore receive that value.
This completes the proof.
5. MISSING COMMUNICATION PATHS
Thus far, we have assumed that a general can send messages directly to every other general. We now remove this assumption. Instead, we suppose that physical barriers place some restrictions on who can send messages to whom. We consider the generals to form the nodes of a simple, ^(2){ }^{2} finite undirected graph GG, where an arc between two nodes indicates that those two generals can send messages
Fig. 6. A 3-regular graph.
Fig. 7. A graph that is not 3-regular.
directly to one another. We now extend Algorithms OM(m)\mathrm{OM}(m) and SM(m)\mathrm{SM}(m), which assumed GG to be completely connected, to more general graphs.
To extend our oral message algorithm OM(m)\mathrm{OM}(m), we need the following definition, where two generals are said to be neighbors if they are joined by an arc.
Definition 1.
(a) A set of nodes {i_(1),dots,i_(p)}\left\{i_{1}, \ldots, i_{p}\right\} is said to be a regular set of neighbors of a node ii if
(i) each i_(j)i_{j} is a neighbor of ii, and
(ii) for any general kk different from ii, there exist paths gamma_(j,k)\gamma_{j, k} from i_(j)i_{j} to kk not passing through ii such that any two different paths gamma_(j,k)\gamma_{j, k} have no node in common other than kk.
(b) The graph GG is said to be p-regular if every node has a regular set of neighbors consisting of pp distinct nodes.
Figure 6 shows an example of a simple 3-regular graph. Figure 7 shows an example of a graph that is not 3-regular because the central node has no regular set of neighbors containing three nodes.
We extend OM(m)\mathrm{OM}(m) to an algorithm that solves the Byzantine Generals Problem in the presence of mm traitors if the graph GG of generals is 3m3 m-regular. (Note that a 3m3 m-regular graph must contain at least 3m+13 m+1 nodes.) For all positive integers mm and pp, we define the algorithm OM(m,p)\mathrm{OM}(m, p) as follows when the graph GG of generals is pp-regular. ( OM(m,p)\mathrm{OM}(m, p) is not defined if GG is not pp-regular.) The definition uses induction on mm.
Algorithm OM(m,p)O M(m, p).
(0) Choose a regular set NN of neighbors of the commander consisting of pp lieutenants.
(1) The commander sends his value to every lieutenant in NN.
(2) For each ii in NN, let v_(i)v_{i} be the value Lieutenant ii receives from the commander, or else RETREAT if he receives no value. Lieutenant ii sends v_(i)v_{i} to every other lieutenant kk as follows:
(A) If m=1m=1, then by sending the value along the path gamma_(i,k)\gamma_{i, k} whose existence is guaranteed by part (a)(ii) of Definition 1.
(B) If m > 1m>1, then by acting as the commander in the algorithm OM(m-1,p-1)\mathrm{OM}(m-1, p-1), with the graph of generals obtained by removing the original commander from GG.
(3) For each kk, and each ii in NN with i!=ki \neq k, let v_(i)v_{i} be the value Lieutenant kk received from Lieutenant ii in step (2), or RETREAT if he received no value. Lieutenant kk uses the value majority (v_(i),dots,v_(i_(p)))\left(v_{i}, \ldots, v_{i_{p}}\right), where N={i_(1),dots,i_(p)}N=\left\{i_{1}, \ldots, i_{p}\right\}.
Note that removing a single node from a pp-regular graph leaves a (p-1)(p-1) regular graph. Hence, one can apply the algorithm OM(m-1,p-1)\mathrm{OM}(m-1, p-1) in step (2) (B).
We now prove that OM(m,3m)\mathrm{OM}(m, 3 m) solves the Byzantine Generals Problem if there are at most mm traitors. The proof is similar to the proof for the algorithm OM(m)\mathrm{OM}(m) and will just be sketched. It begins with the following extension of Lemma 1.
Lemma 2. For any m > 0m>0 and any p >= 2k+mp \geq 2 k+m, Algorithm OM(m,p)O M(m, p) satisfies IC2I C 2 if there are at most kk traitors.
Proof. For m=1m=1, observe that a lieutenant obtains the value majority (v_(1):}\left(v_{1}\right., dots,v_(p)\ldots, v_{p} ), where each v_(i)v_{i} is a value sent to him by the commander along a path disjoint from the path used to send the other values to him. Since there are at most kk traitors and p >= 2k+1p \geq 2 k+1, more than half of those paths are composed entirely of loyal lieutenants. Hence, if the commander is loyal, then a majority of the values v_(i)v_{i} will equal the value he sent, which implies that IC2 is satisfied.
Now assume the lemma for m-1,m > 1m-1, m>1. If the commander is loyal, then each of the pp lieutenants in NN gets the correct value. Since p > 2kp>2 k, a majority of them are loyal, and by the induction hypothesis each of them sends the correct value to every loyal lieutenant. Hence, each loyal lieutenant gets a majority of correct values, thereby obtaining the correct value in step (3).
The correctness of Algorithm OM(m,3m)\mathrm{OM}(m, 3 m) is an immediate consequence of the following result.
Theorem 3. For any m > 0m>0 and any p >= 3mp \geq 3 m, Algorithm OM(m,p)O M(m, p) solves the Byzantine Generals Problem if there are at most mm traitors.
Proof. By Lemma 2, letting k=mk=m, we see that OM(m,p)\mathrm{OM}(m, p) satisfies IC2. If the commander is loyal, then IC1 follows from IC2, so we need only prove IC1 under the assumption that the commander is a traitor. To do this, we prove that every loyal lieutenant gets the same set of values v_(i)v_{i} in step (3). If m=1m=1, then this follows because all the lieutenants, including those in NN, are loyal and the paths gamma_(i,k)\gamma_{i, k} do not pass through the commander. For m > 1m>1, a simple induction argument can be applied, since p >= 3mp \geq 3 m implies that p-1 >= 3(m-1)p-1 \geq 3(m-1).
Our extension of Algorithm OM(m)\mathrm{OM}(m) requires that the graph GG be 3m3 m-regular, which is a rather strong connectivity hypothesis. ^(3){ }^{3} In fact, if there are only 3m+3 m+ 1 generals (the minimum number required), then 3m3 m-regularity means complete connectivity, and Algorithm OM(m,3m)\mathrm{OM}(m, 3 m) reduces to Algorithm OM(m)\mathrm{OM}(m). In contrast, Algorithm SM(m)\operatorname{SM}(m) is easily extended to allow the weakest possible connectivity hypothesis. Let us first consider how much connectivity is needed for the Byzantine Generals Problem to be solvable. IC2 requires that a loyal lieutenant obey a loyal commander. This is clearly impossible if the commander cannot communicate with the lieutenant. In particular, if every message from the
commander to the lieutenant must be relayed by traitors, then there is no way to guarantee that the lieutenant gets the commander’s order. Similarly, IC1 cannot be guaranteed if there are two lieutenants who can only communicate with one another via traitorous intermediaries.
The weakest connectivity hypothesis for which the Byzantine Generals Problem is solvable is that the subgraph formed by the loyal generals be connected. We show that under this hypothesis, the algorithm SM(n-2)\operatorname{SM}(n-2) is a solution, where nn is the number of generals-regardless of the number of traitors. Of course, we must modify the algorithm so that generals only send messages to where they can be sent. More precisely, in step (1), the commander sends his signed order only to his neighboring lieutenants; and in step (2)(B), Lieutenant ii only sends the message to every neighboring lieutenant not among the j_(r)j_{r}.
We prove the following more general result, where the diameter of a graph is the smallest number dd such that any two nodes are connected by a path containing at most dd arcs.
Theorem 4. For any mm and dd, if there are at most mm traitors and the subgraph of loyal generals has diameter d, then Algorithm SM(m+d-1)\operatorname{SM}(m+d-1) (with the above modification) solves the Byzantine Generals Problem.
Proof. The proof is quite similar to that of Theorem 2 and is just sketched here. To prove IC2, observe that by hypothesis there is a path from the loyal commander to a lieutenant ii going through d-1d-1 or fewer loyal lieutenants. Those lieutenants will correctly relay the order until it reaches ii. As before, assumption A4 prevents a traitor from forging a different order.
To prove IC1, we assume the commander is a traitor and must show that any order received by a loyal lieutenant ii is also received by a loyal lieutenant jj. Suppose ii receives an order v:0:j_(1):cdots:j_(k)v: 0: j_{1}: \cdots: j_{k} not signed by jj. If k < mk<m, then ii will send it to every neighbor who has not already received that order, and it will be relayed to jj within d-1d-1 more steps. If k >= mk \geq m, then one of the first mm signers must be loyal and must have sent it to all of his neighbors, whereupon it will be relayed by loyal generals and will reach jj within d-1d-1 steps.
Corollary. If the graph of loyal generals is connected, then SM(n-2)S M(n-2) (as modified above) solves the Byzantine Generals Problem for nn generals.
Proof. Let dd be the diameter of the graph of loyal generals. Since the diameter of a connected graph is less than the number of nodes, there must be more than dd loyal generals and fewer than n-dn-d traitors. The result follows from the theorem by letting m=n-d-1m=n-d-1.
Theorem 4 assumes that the subgraph of loyal generals is connected. Its proof is easily extended to show that even if this is not the case, if there are at most mm traitors, then the algorithm SM(m+d-1)\mathrm{SM}(m+d-1) has the following two properties:
Any two loyal generals connected by a path of length at most dd passing through only loyal generals will obey the same order.
If the commander is loyal, then any loyal lieutenant connected to him by a path of length at most m+dm+d passing only through loyal generals will obey his order.
6. RELIABLE SYSTEMS
Other than using intrinsically reliable circuit components, the only way we know to implement a reliable computer system is to use several different “processors” to compute the same result, and then to perform a majority vote on their outputs to obtain a single value. (The voting may be performed within the system, or externally by the users of the output.) This is true whether one is implementing a reliable computer using redundant circuitry to protect against the failure of individual chips, or a ballistic missile defense system using redundant computing sites to protect against the destruction of individual sites by a nuclear attack. The only difference is in the size of the replicated “processor”.
The use of majority voting to achieve reliability is based upon the assumption that all the nonfaulty processors will produce the same output. This is true so long as they all use the same input. However, any single input datum comes from a single physical component-for example, from some other circuit in the reliable computer, or from some radar site in the missile defense system-and a malfunctioning component can give different values to different processors. Moreover, different processors can get different values even from a nonfaulty input unit if they read the value while it is changing. For example, if two processors read a clock while it is advancing, then one may get the old time and the other the new time. This can only be prevented by synchronizing the reads with the advancing of the clock.
In order for majority voting to yield a reliable system, the following two conditions should be satisfied:
All nonfaulty processors must use the same input value (so they produce the same output).
If the input unit is nonfaulty, then all nonfaulty processes use the value it provides as input (so they produce the correct output).
These are just our interactive consistency conditions IC1 and IC2, where the “commander” is the unit generating the input, the “lieutenants” are the processors, and “loyal” means nonfaulty.
It is tempting to try to circumvent the problem with a “hardware” solution. For example, one might try to insure that all processors obtain the same input value by having them all read it from the same wire. However, a faulty input unit could send a marginal signal along the wire-a signal that can be interpreted by some processors as a 0 and by others as a 1 . There is no way to guarantee that different processors will get the same value from a possibly faulty input device except by having the processors communicate among themselves to solve the Byzantine Generals Problem.
Of course, a faulty input device may provide meaningless input values. All that a Byzantine Generals solution can do is guarantee that all processors use the same input value. If the input is an important one, then there should be several separate input devices providing redundant values. For example, there should be redundant radars as well as redundant processing sites in a missile defense system. However, redundant inputs cannot achieve reliability; it is still necessary to insure that the nonfaulty processors use the redundant data to produce the same output.
In case the input device is nonfaulty but gives different values because it is read while its value is changing, we still want the nonfaulty processors to obtain a reasonable input value. It can be shown that, if the functions majority and choice are taken to be the median functions, then our algorithms have the property that the value obtained by the nonfaulty processors lies within the range of values provided by the input unit. Thus, the nonfaulty processors will obtain a reasonable value so long as the input unit produces a reasonable range of values.
We have given several solutions, but they have been stated in terms of Byzantine generals rather than in terms of computing systems. We now examine how these solutions can be applied to reliable computing systems. Of course, there is no problem implementing a “general’s” algorithm with a processor. The problems lie in implementing a message passing system that meets assumptions A1-A3 (assumptions A1-A4 for Algorithm SM (m)(m) ). We now consider these assumptions in order.
A1. Assumption A1 states that every message sent by a nonfaulty processor is delivered correctly. In real systems, communication lines can fail. For the oral message algorithms OM(m)\mathrm{OM}(m) and OM(m,p)\mathrm{OM}(m, p), the failure of the communication line joining two processors is indistinguishable from the failure of one of the processors. Hence, we can only guarantee that these algorithms will work in the presence of up to mm failures, be they processor or communication line failures. (Of course, the failure of several communication lines attached to the same processor is equivalent to a single processor failure.) If we assume that a failed communication line cannot result in the forgery of a signed message-an assumption which we will see below is quite reasonable, then our signed message algorithm SM(m)\operatorname{SM}(m) is insensitive to communication line failure. More precisely, Theorem 4 remains valid even with communication line failure. A failed communication line has the same effect as simply removing the communication line-it lowers the connectivity of the processor graph.
A2. Assumption A2 states that a processor can determine the originator of any message that it received. What is actually necessary is that a faulty processor not be able to impersonate a nonfaulty one. In practice, this means that interprocess communication be over fixed lines rather than through some message switching network. (If a switching network is used, then faulty network nodes must be considered, and the Byzantine Generals Problem appears again.) Note that assumption A2 is not needed if A4 is assumed and all messages are signed, since impersonation of another processor would imply forging its messages.
A3. Assumption A3 requires that the absence of a message can be detected. The absence of a message can only be detected by its failure to arrive within some fixed length of time-in other words, by the use of some time-out convention. The use of time-out to satisfy A3 requires two assumptions:
There is a fixed maximum time needed for the generation and transmission of a message.
The sender and receiver have clocks that are synchronized to within some fixed maximum error.
The need for the first assumption is fairly obvious, since the receiver must know
how long he needs to wait for the message to arrive. (The generation time is how long it takes the processor to send the message after receiving all the input necessary to generate it.) The need for the second assumption is less obvious. However, it can be shown that either this assumption or an equivalent one is necessary to solve the Byzantine Generals Problem. More precisely, suppose that we allow algorithms in which the generals take action only in the following circumstances:
At some fixed initial time (the same for all generals).
Upon the receipt of a message.
When a randomly chosen length of time has elapsed. (I.e., a general can set a timer to a random value and act when the timer goes off.)
(This yields the most general class of algorithms we can envision which does not allow the construction of synchronized clocks.) It can be shown that no such algorithm can solve the Byzantine Generals Problem if messages can be transmitted arbitrarily quickly, even if there is an upper bound on message transmission delay. Moreover, no solution is possible even if we restrict the traitors so that the only incorrect behavior they are permitted is the failure to send a message. The proof of this result is beyond the scope of this parer. Note that placing a lower as well as an upper bound on transmission delay aliows processors to implement clocks by sending messages back and forth.
The above two assumptions make it easy to detect unsent messages. Let mu\mu be the maximum message generation and transmission delay, and assume the nonfaulty processors have clocks that differ from one another by at most tau\tau at any time. Then any message that a nonfaulty process should begin to generate by time TT on its clock will arrive at its destination by time T+mu+tauT+\mu+\tau on the receiver’s clock. Hence, if the receiver has not received the message by that time, then it may assume that it was not sent. (If it arrives later, then the sender must be faulty, so the correctness of our algorithms does not depend upon the message being sent.) By fixing the time at which the input processor sends its value, one can calculate until what time on its own clock a processor must wait for each message. For example, in Algorithm SM(m)\mathrm{SM}(m) a processor must wait until time T_(0)T_{0}+k(mu+tau)+k(\mu+\tau) for any message having kk signatures, where T_(0)T_{0} is the time (on his clock) at which the commander starts executing the algorithm.
No two clocks run at precisely the same rate, so no matter how accurately the processors’ clocks are synchronized initially, they will eventually drift arbitrarily far apart unless they are periodically resynchronized. We therefore have the problem of keeping the processors’ clocks all synchronized to within some fixed amount, even if some of the processors are faulty. This is as difficult a problem as the Byzantine Generals Problem itself. Solutions to the clock synchronization problem exist which are closely related to our Byzantine Generals solutions. They will be described in a future paper.
A4. Assumption A4 requires that processors be able to sign their messages in such a way that a nonfaulty processor’s signature cannot be forged. A signature is a piece of redundant information S_(i)(M)S_{i}(M) generated by process ii from a data item MM. A message signed by ii consists of a pair ( M,S_(i)(M)M, S_{i}(M) ). To meet parts (a)
and (b) of A4, the function S_(i)S_{i} must have the following two properties:
(a) If processor ii is nonfaulty, then no faulty processor can generate S_(i)(M)S_{i}(M).
(b) Given MM and XX, any process can determine if XX equals S_(i)(M)S_{i}(M).
Property (a) can never be guaranteed, since S_(i)(M)S_{i}(M) is just a data item, and a faulty processor could generate any data item. However, we can make the probability of its violation as small as we wish, thereby making the system as reliable as we wish. How this is done depends upon the type of faults we expect to encounter. There are two cases of interest:
Random Malfunction. By making S_(i)S_{i} a suitably “randomizing” function, we can make the probability that a random malfunction in a processor generates a correct signature essentially equal to the probability of its doing so through a random choice procedure-that is, the reciprocal of the number of possible signatures. The following is one method for doing this. Assume that messages are encoded as positive integers less than PP, where PP is a power of two. Let S_(i)(M)S_{i}(M) equal M**K_(i)mod PM * K_{i} \bmod P, where K_(i)K_{i} is a randomly chosen odd number less than PP. Letting K_(i)^(-1)K_{i}^{-1} be the unique number less than PP such that K_(i)**K_(i)^(-1)-=1mod PK_{i} * K_{i}^{-1} \equiv 1 \bmod P, a process can check that X=S_(i)(M)X=S_{i}(M) by testing that M-=X**K_(i)^(-1)mod PM \equiv X * K_{i}^{-1} \bmod P. If another processor does not have K_(i)K_{i} in its memory, then the probability of its generating the correct signature M**K_(i)M * K_{i} for a single (nonzero) message MM should be 1//P1 / P : its probability of doing so by random choice. (Note that if the processor could obtain K_(i)K_{i} by some simple procedure, then there might be a larger probability of a faulty processor jj forging ii 's signature by substituting K_(i)K_{i} for K_(j)K_{j} when trying to compute S_(j)(M)S_{j}(M).)
Malicious Intelligence. If the faulty processor is being guided by a malicious intelligence-for example, if it is a perfectly good processor being operated by a human who is trying to disrupt the system-then the construction of the signature function S_(i)S_{i} becomes a cryptography problem. We refer the reader to [1] and [4] for a discussion of how this problem can be solved.
Note that it is easy to generate the signature S_(i)(M)S_{i}(M) if the process has already seen that signature. Hence, it is important that the same message never have to be signed twice. This means that, when using SM(m)\operatorname{SM}(m) repeatedly to distribute a sequence of values, sequence numbers should be appended to the values to guarantee uniqueness.
7. CONCLUSION
We have presented several solutions to the Byzantine Generals Problem, under various hypotheses, and shown how they can be used in implementing reliable computer systems. These solutions are expensive in both the amount of time and the number of messages required. Algorithms OM(m)\mathrm{OM}(m) and SM(m)\mathrm{SM}(m) both require message paths of length up to m+1m+1. In other words, each lieutenant may have to wait for messages that originated at the commander and were then relayed via mm other lieutenants. Fischer and Lynch have shown that this must be true for any solution that can cope with mm traitors, so our solutions are optimal in that respect. Our algorithms for a graph that is not completely connected require
message paths of length up to m+dm+d, where dd is the diameter of the subgraph of loyal generals. We suspect that this is also optimal.
Algorithms OM(m)\mathrm{OM}(m) and SM(m)\mathrm{SM}(m) involve sending up to (n-1)(n-2)dots(n-1)(n-2) \ldots ( n-m-1n-m-1 ) messages. The number of separate messages required can certainly be reduced by combining messages. It may also be possible to reduce the amount of information transferred, but this has not been studied in detail. However, we expect that a large number of messages will still be required.
Achieving reliability in the face of arbitrary malfunctioning is a difficult problem, and its solution seems to be inherently expensive. The only way to reduce the cost is to make assumptions about the type of failure that may occur. For example, it is often assumed that a computer may fail to respond but will never respond incorrectly. However, when extremely high reliability is required, such assumptions cannot be made, and the full expense of a Byzantine Generals solution is required.
REFERENCES
Diffie, W., and Hellman, M.E. New directions in cryptography. IEEE Trans. Inf. Theory IT-22 (Nov. 1976), 644-654.
Dolev, D. The Byzantine generals strike again. J. Algorithms 3, 1 (Jan. 1982).
Pease, M., Shostak, R., and Lamport, L. Reaching agreement in the presence of faults. J. ACM 27, 2 (Apr. 1980), 228-234.
Rivest, R.L., Shamir, A., and Adleman, L. A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21, 2 (Feb. 1978), 120-126.
Received April 1980; revised November 1981; accepted November 1981