Network Working Group P. Tsuchiya INTERNET-DRAFT V1 Bellcore June 1991 Discovery and Routing over the SMDS Service Status of this Memo The difference between this version (1) and the previous version (0), besides the different formatting, is that this version introduces the use of ARP for routers using OSPF to discover the address of other OSPF routers, even though those other routers are not on the same group address. For now, this memo is an internet-draft, and in fact this version of it is very rough. I'm sure much of the language will need extensive work, especially the musts, shoulds, and mays. In addition, parts of this memo are currently under-specified. There are some relatively complex protocol mechanisms described in this memo, which need extensive critical review. In particular, there are some (hopefully minor) departures from the traditional use of IP addresses. The techniques described in this memo need to be implemented as soon as possible. Please send comments to the IP Over Large Public Data Networks working group, iplpdn@nri.reston.va.us, or if the comments are particularly humiliating to the author, send them to tsuchiya@thumper.bellcore.com. The above paragraphs of course won't appear in the final RFC. The following paragraph will, but for now please ignore it. This memo defines a protocol for both intra- and inter-domain discovery and routing over the Switched Multi-megabit Data Service Network. This RFC specifies an IAB standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "IAB Official Protocol Standards" for the standardization state and status of this protocol. Distribution of this memo is unlimited. Abstract RFC 1209 [1] describes the encapsulation of IP over the SMDS service generally, and the use of ARP over the SMDS service configured as a Logical IP Subnetwork (LIS). This memo expands on RFC 1209 by describing how to do discovery and routing, both for private (intra- domain) and public (inter-domain) applications, over the SDMS IPLPDN WG [Page 1] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 service. As such, this memo considers cases where a private network is spread over multiple LISs. In particular, this memo allows hosts or routers in different LISs to exchange packets directly--without going through an intervening router. Introduction RFC 1209 gives an overview of SMDS. This memo assumes an understanding of the material in RFC 1209. SMDS can be configured to appear to the user to be both a private, usually multicast network and a public, usually non-multicast network. The former is achieved mainly through the use of group addresses and address filtering. The latter is possible because SMDS uses globally unique addressing at its interfaces. The purpose of the private/multicast configuration is to emulate to the extent possible a multicast LAN. As a result, SMDS is easily integrated into a LAN-based, private network. Unfortunately, because of the scope of the SMDS service, it is impossible and indeed undesirable to emulate every aspect of a multicast LAN. In particular, the membership of a single SMDS group address, which forms a Logical IP Subnetwork (LIS), must be limited (currently, to 128 members). Therefore, a private network that has more than 128 systems (hosts or routers) to connect to SMDS cannot treat the SMDS service as a single LIS, and must, in some respects, view the SMDS service as multiple LISs connected by routers. This multiple LIS configuration can result in a path whereby a packet enters and exits the SMDS service more than once. For instance, consider a packet transmission between two hosts X and Y on different LISs. Since the two hosts don't view each other as being on the same LIS, they will use a router that belongs to both LISs to send traffic between them. The SMDS service is crossed twice when strictly speaking it only need be crossed once. Note: The extent to which this is a serious problem depends on many factors, such as how often it occurs, and how non- optimal the two-hop path is. For instance, it is much worse if both hosts are on the east coast and the router is on the west coast than if the router is close to one of the hosts. In other words, these multi-hop paths may or may not be acceptable. A system attached to the SMDS service, therefore, may exchange pack- ets with one or more of the following: o a private system on one of its LISs--private-local IPLPDN WG [Page 2] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 o a private system not on one of its LISs--private-remote o a public system (which by definition is not on one of its LISs)-- public RFC 1209 describes how a system can discover the SMDS address of private-local system. This memo describes how systems can discover the SMDS addresses of, and therefore exchange packets directly with, private-remote and public systems. It describes how this should be done both with existing protocol mechanisms and with new protocol mechanisms. In the former case, static configuration is the primary means of discovery. In the latter, a new protocol mechanism, the unsolicited ARP Reply, is used to avoid the burden of static confi- guration for hosts. Addressing Considerations Except for the special case of two routers connected by a point-to- point link, all systems have one or more IP address/subnet mask pairs associated with every network interface. Note: This may not be true for pre-subnetting implementa- tions. In this memo we ignore such implementations on the basis that if one is willing to equip a system with an SMDS interface, then one should also be willing to equip it with up-to-date software. The subnet mask, when applied to the IP address, gives the network/subnet number of the attached network [2]. The following conditions apply to the use of IP network/subnet numbers [3,4]: 1. If the network/subnet number of an IP address matches that of the connected network, then the system with that IP address is directly reachable over the connected network. 2. If the network/subnet number of an IP address (excluding those of neighbor routers that are explicitly configured) does not match that of the connected network, then the system with that IP address can only be reached through a router. 3. Routers do not necessarily require that a neighbor router on the same network share a network number with it. This depends on the routing protocol. Both BGP [5] and certain OSPF [6] configura- tions (un-numbered point-to-point links and virtual links) do not require a shared network number. 4. Hosts may or may not require that routers reachable over the con- nected network share a network number with them. RFC 1122 is not IPLPDN WG [Page 3] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 clear on this point [3]. In any event, it seems likely that many hosts will require that even explicitly configured routers share a network/subnet number. Given the ambiguity of 4, it is safer to assume that all hosts require that routers reachable over the connected network share a network number with them. With these conditions, routers won't be able to exchange packets with hosts, and hosts won't be able to exchange packets with any system, that does not share a network/subnet number. However, it is not possible for a system to have one network/subnet number for packet exchanges with both private-local and private- remote systems. This is because the system will use ARP to discover the SMDS address of private-local systems, and will use some other mechanism (static configuration or unsolicited ARP Reply) to discover the SMDS address of private-remote systems. But the only way a sys- tem can distinguish between private-local and private-remote systems is by comparing the IP address against two network/subnet numbers, one for private-local and one for private-remote. This leads to the following requirements for systems attached to the SMDS service. o An address is considered private-local if ARP is enabled for the network/subnet number that the address matches. Each system must have some local means for determining whether or not ARP is enabled for a network/subnet number. From the perspective of SNMP, however, ARP is considered disabled if there is no ipO- verSMDSAddressEntry entry in the SMDS MIB [7] for the IP address associated with that network/subnet number. o All systems that are members of the same LIS (i.e., are private- local with respect to each other) must share a network/subnet number (this requirement is stated in RFC 1209). o Further, systems that are not members of a LIS must not have the network/subnet number for that LIS. o If two systems, one or both of which are not routers, are private-remote or public with respect to each other, and those two systems wish to exchange packets directly, then those two systems must share a network/subnet number. (Below we show that public network/subnet numbers are difficult to form and of potentially limited use.) A convenient IP network/subnet numbering arrangement for a private IPLPDN WG [Page 4] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 network that spans multiple LISs would be to assign a subnetted part of a class B network number to all systems of the private network that are attached to the SMDS service. Each LIS would be further subnetted. Therefore, each system could have two addresses and masks, as follows: address (hex) mask private-local: 80.1d.42.c9 ff.ff.ff.80 private-remote: 80.1d.42.ca ff.ff.f0.00 The private-local mask allows for 126 addresses (excluding -1 and 0), which just about fills up the 128 maximum on group address members. The private-remote mask allows for five bits to distinguish the vari- ous LISs, for a total of (2**5) - 2 = 30 LISs. The remaining values of the class B could be used for networks "behind" the SMDS service (LANs and such). To continue the example, if the system with address 80.1d.42.c9 received an IP packet with destination address 80.1d.48.9e, it would see that the address was not on its LIS: ((80.1d.42.c9 & ff.ff.ff.80 = 80.1d.42.80) != (80.1d.48.9e & ff.ff.ff.80 = 80.1d.48.80)). It would therefore not ARP for the SMDS address. If the system with address 80.1d.42.c9 received an IP packet with destination address 80.1d.42.e1, then there is an ambiguity, because this address matches both the private-local and private-remote masks. In this case, the system should know to take the "more specific" match, which is the one where the mask has the most 1's. If it doesn't then the packet may take an extra hop. This results in the following requirement. o If a system arranges its addresses so that its private-local network/subnet number is a subnetted portion of its private-remote network/subnet number, or if its private-remote network/subnet number is a subnetted portion of its public network/subnet number, then it should match on the most specific mask. o If the system is not capable of picking the most specific match, then its private-local, private-remote, and public network/subnet numbers should not overlap. Note: The above addressing arrangement resulted in the sys- tem having two separate IP addresses (for its two logical IPLPDN WG [Page 5] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 interfaces) even though it only has one physical interface. While it would of course be possible to build a system that could handle having one IP address with multiple logical interfaces with different masks, this seems to be too much of a departure from IP address fundamentals. Therefore, systems configured with multiple logical interfaces must have multiple IP addresses. Fortunately, most systems should have no more than two such addresses. Forming public network/subnet numbers is problematic, and potentially not very useful. To form a public network/subnet number, a large network number, almost certainly a class A, is needed. This number would be assigned to all systems attached to the SMDS service, thus increasing the complexity and overhead of all systems. But the only case where the public network/subnet number is needed is where one or both of the systems communicating are hosts (because routers are able to directly exchange packets without sharing a network/subnet number). Host to host or host to router public packet exchange is likely to be the least common type of packet exchange over an SMDS network. Router-to-router packet exchange, both public and private, should be much more common, and directly attached hosts will more likely exchange packets privately than publicly. And, a host can directly exchange packets publicly without a public network/subnet number by configuring itself as a router and running a scaled down version of BGP (this is discussed later). Or, a host can always send public packets by going through one of its private routers, thus suffering multi-hops. For the above reasons, it seems unnecessary to have a public network/subnet number. Router Configurations Before discussing the mechanisms of discovery and routing over SMDS, we discuss some issues concerning router configuration. Any two routers that have routing table entries for each other and can forward packets to each other are called neighbors. Depending on the routing protocol, neighbor routers may or may not exchange rout- ing updates. For instance, with OSPF, because of designated routers, it is possible for two routers to learn of each other and forward IP packets to each other without ever directly exchanging routing updates. The IP address of neighbor routers is learned from the designated routers in OSPF packets, and the SMDS address of neighbor routers is learned from the designated routers acting as ARP servers (see section "Router Operation"). With other protocols, such as RIP IPLPDN WG [Page 6] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 [9], two routers can only be neighbors if they exchange routing updates directly. A private domain will have some number R of routers connected to the SMDS service. If all R routers are neighbors of each other (each router has R-1 neighbors), then a packet will almost never traverse more than two routers. The worst-case multi-hop would be three: host-router, router-router, router-host. We call this configuration of routers the all-neighbors configuration. The partial-neighbors configuration, then, is one where not all routers in a private domain are neighbors of each other. With the partial-neighbors configuration, a packet may take any number of hops across the SMDS service, depending on how sparse the neighbor connec- tivity is. If there are 5 routers, A, B, C, D, and E, and the neigh- bor relationships are A-B, B-C, C-D, and D-E, then a packet entering at A and exiting at E will cross the SMDS service four times. The advantage of the all-neighbors configuration is shorter paths across the SMDS service, and this memo recommends it whenever possi- ble. Depending on the routing protocol used, and the group address configuration, the disadvantage may be in increased routing traffic over the SMDS service. Another disadvantage of the all-neighbors configuration is in the amount of state each router has to keep, but this is unlikely to be a problem except perhaps for extremely large configurations (say many hundreds of routers directly attached to the SMDS service). In some cases (discussed in section ROUTING PROTOCOL OPERATION), the amount of configuration needed to maintain an all- neighbors configuration may be prohibitive. Because common network/subnet addresses are not necessarily available for public systems, it is necessary to find a means of discovering SMDS addresses purely in the context of the public routing protocol, which is BGP. Since there will be many hundreds or thousands of BGP routers on the SMDS service, it is critical that BGP can be operated with minimal configuration, traffic, or memory overhead. This memo defines three modes for BGP operation: Mode 1: Two BGP peers exchange BGP information directly Mode 2: Two BGP peers exchange BGP information via a "BGP server" Mode 2a: Full router--the BGP router maintains complete BGP information Mode 2b: Partial router--the BGP router maintains only IPLPDN WG [Page 7] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 what it needs The purpose of Mode 2 is to minimize configuration, traffic, and memory overhead when there are a large number of BGP routers con- nected. With BGP servers, BGP routers need only configure a handful of BGP servers, not every other BGP router. The BGP servers, then, act as a distribution point for BGP routing information. (Note that a BGP server runs standard BGP. It is a server by virtue of its configura- tion parameters.) While a full router must receive and store roughly the same amount of information whether it peers with a BGP server or directly with every other BGP router, the KEEPALIVE traffic is substantially reduced with BPG servers. For instance, 5000 domains and a KEEPALIVE of 5 minutes results in an average of 16 KEEPALIVEs per second per router. Moreover, BGP servers allow for a "partial router" (Mode 2b). This is a router that maintains partial or no permanent routing informa- tion. Instead, the partial router sends its IP packets to a BGP server, which forwards the packet appropriately, and sends the par- tial router "on-demand" BGP Update information for only the destina- tion in the IP packet. Finally, the use of BGP servers eases the configuration problem. There is no automatic way to configure BGP peers. Therefore, the more BGP peers a BGP router has, the more manual configuration neces- sary. A BGP router can mix Modes 1 and 2. In other words, it can peer directly with some BGP routers, but otherwise receive its information from BGP servers. Routing and Discovery over SMDS All hosts and routers have an IP-to-physical address translation table. (For the purposes of this memo, the physical address corresponds to the SMDS address.) For a system to send a packet directly to another system, it must be able to translate the IP address to a physical address, either by indexing the table or through an algorithmic manipulation of the IP address. (The latter does not apply to SMDS.) Hosts can learn IP-to-physical address translations by only one of two ways: static configuration of the IP-to-physical address transla- tion table, or reception of an ARP Reply. Routers can learn IP-to- physical address translations in the same two ways as the hosts, plus IPLPDN WG [Page 8] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 via the BGP attribute NEXT_HOP_SNPA [8] (the latter applies mainly to public internetworking). While it is always possible to avoid multi-hops by staticly configur- ing the IP-to-physical address translation table, it is preferable to do so automatically via the reception of ARP replies for hosts, or BGP Updates for routers. The NEXT_HOP_SNPA information in the BGP Update is adequate for conveying SMDS addresses for public internet- working. Since ARP requests cannot be sent for systems that are private-remote, we define a new mechanism for learning SMDS addresses, which is the Unsolicited ARP Reply (UARP Reply). o The reception of a UARP Reply is handled exactly the same as the reception of an (requested) ARP Reply. o Hosts must never send UARP Replies. o When a router F receives an IP packet P from a system S over its SMDS interface for which the next hop system N on the path to the destination is back over the SMDS interface, the router F forwards the IP packet P to N, and may send S a UARP Reply or a "on-demand" BGP UPDATE depending on the following. The router F searches its routing databases for a neighbor whose SMDS address matches the source address of the received packet P. The address may match nothing, in which the packet will have been received from a host, or the address may match an entry for a BGP (public) neighbor, or an entry for a private neighbor. If the source SMDS address in packet P matches nothing, then an ICMP Redirect followed immediately by a UARP Reply is sent. The UARP Reply contains the SMDS address of the next hop system N (in ar$sha), and the IP address of the next hop system N given in the Redirect (in ar$spa). The source IP address in the IP header of the UARP Reply should contain the IP address of F. The destina- tion IP address in the IP header of the UARP Reply contains the source IP address of the received packet P. Otherwise, if the source SMDS address in packet P matches that of a private router neighbor R, and the next hop system N is not a router neighbor of F (which is determined by comparing the SMDS address of the next hop system with those of the router neigh- bors), and the next hop system N is either private-local or private-remote, then a UARP Reply is sent to R. The packet fields ar$sha, ar$spa, and the source IP address are set as in the previ- ous paragraph. However, the destination IP address in the IP header of the UARP Reply contains the IP address of router neigh- bor R. Note that router neighbor R might also be a BGP neighbor. IPLPDN WG [Page 9] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 However, since R is private, it would be an internal BGP neighbor, and will have therefore already received all of the BGP informa- tion that F has (internal BGP neighbors are always full routers). Otherwise, if the source SMDS address in packet P matches that of a BGP neighbor R, then a BGP Update is sent to R. It contains the IP address of the next hop router F in the NEXT_HOP attribute, the SMDS address of F in the NEXT_HOP_SNPA attribute, and the network of the destination address in packet P must be one of the networks listed in the BGP Update. Note that F should not have received packet P if BGP neighbor R is a full router. F may therefore wish to check to make sure that R is a partial router, and if not, to report an error to system management. The reason for sending the ICMP Redirect in the case of sending a UARP Reply to a host is to give the host an IP number to relate the UARP Reply with. If the IP address of destination of packet P is not on the SMDS service, the the host will not recognize that it can reach the destination directly and may not accept the UARP Reply. With the ICMP Redirect, the host knows that it is routing to a router on the attached network. With the technique of UARP Replies, if either the SMDS entry or exit system is a host, then a direct path will be found across the SMDS service even if the partial-router configuration is used. However, if both the SMDS entry and exit systems are routers, and a multi-hop path is found by routing, then that path will persist. The reason for this is that it just doesn't work to have a router try to redirect another router to still another router. This memo doesn't go into detail about this except to say that the IP architecture is such that routers fundamentally expect to know everything they need to know from the beginning, and getting them to cache things on the fly generally mucks things up. Only by putting limitations on the spreading of BGP routing information, can we get away with "on- demand" BGP updates for partial routers. o A router must have some mechanism to prevent its sending an exces- sive number of the same UARP Replies or BGP Updates. This might happen if the system receiving the UARP Replies or BGP Updates did not honor them, for instance in the case of UARP Replies because its mask was not configured correctly. One such algorithm would be to establish three variables associated with a particular UARP Reply or BGP Update; arpEnabled, arpRate and arpPersistance. When the "first" UARP Reply or BGP Update is sent, create the three variables, and set arpEnabled to ON, set arpRate to some constant, say 20, and set arpPersistance to some other constant, say 5. Each time a packet is received that should result in sending IPLPDN WG [Page 10] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 the UARP Reply or BGP Update, check arpEnabled. If it is OFF, then do nothing (that is, don't send the UARP Reply or BGP Update). If arpEnabled is ON, decrement arpRate. If arpRate does not decrement to 0, then do nothing. If arpRate decrements to 0, then send a UARP Reply (preceded by the ICMP Redirect if necessary) or BGP Update, and decrement arpPersistance. If arpPersistance does not decrement to 0, reset arpRate to its constant (20). If arpPersistance does decrement to 0, then set arpEnabled to OFF. After some timeout period, destroy the three variables (so that another identical UARP Reply or BGP Update will be considered the "first" one). This algorithm has the effect of constraining the rate at which UARP Replies or BGP Updates will be sent, and of giving up on sending them for a period of time if the recipient seems to be ignoring them. o Routers and Hosts must time-out the information learned from UARP Replies and on-demand BGP Updates, just as they do for ARP Replies. Routers and Hosts should refresh the time-out period upon reception of a packet with an SMDS source address and IP source address matching the information in the UARP Reply or BGP Update. Note that in the case of the BGP Update, the source IP address will be compared against a masked IP address. Routing Protocol Operations In what follows, we discuss the operation of specific routing proto- cols over SMDS. In some cases, the routing protocol takes advantage of multicasting over SMDS. o In such cases, the routing protocol must use the same group address as that defined for sending ARP requests. In the SMDS MIB [7], this is the object-type smdsARPReq, which is a member of ipO- verSMDSAddressEntry. We assume that the reader is familier with the protocols discussed. OSPF and RIP All-Neighbors Configurations: OSPF and RIP can operate both in multicast and non-multicast modes. Multicast is preferable because it requires less configuration. If there are less than 128 routers in a private network attached to the SMDS service, then those routers can form a single LIS (group address) that includes just themselves. We call this the router LIS. They can multicast OSPF or RIP packets over the LIS as they would over a multicast LAN. The only configuration necessary is that of the addresses (IP, SMDS group, and SMDS single). Designated routers are elected in order to reduce overhead. IPLPDN WG [Page 11] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 If there are also hosts attached to the SMDS service, and the total number of hosts and routers exceeds 128, or for some other reason all hosts and routers cannot join the same LIS (for instance because group addressing is not yet available over LATA boundaries), then LISs in addition to the router LIS must be formed for the hosts. Each of these LISs must have at least one router as a member. Note that this configuration essentially forms a 2-level hierarchy of LISs. The "core" LIS is the router LIS. The "leaf" LISs are the host LISs, and attach to the core LIS by virtue of routers that belong to both LISs. One "level" of UARP Replies are required to allow two hosts on different LISs to exchange packets directly. OSPF Designated Router Operation: In some cases, it may not be possible to put all routers on the same LIS. Using the designated router election feature of OSPF, it is still possible to get an all-neighbors configuration of routers without requiring N**2 configuration of routers. The operation of designated router election is as follows. Some or all routers are configured as eligible. This means that they may become designated routers. Some or no routers are configured as ineligible. The eligible routers have a means of sending OSPF pack- ets to all other routers (either using ARP or by static configura- tion). The ineligible routers do not need to be configured with information on how to reach any other routers. The eligible routers establish each others as neighbors. Of these, one is chosen as the designated. The designated router then becomes neighbors with all routers, forming a star configuration. The desig- nated router then tells all routers of all other routers in its link state advertisements. At this point, the ineligible routers know the IP addresses of all other routers, but not the SMDS addresses. Therefore, when a packet arrives that must be routed to another router, the SMDS address for the other router must be learned. This is done by sending ARP Requests to the designated router. To make this work, the following is required: o A separate network/subnet number is required for all routers. This network/subnet number must be distinguishable from private- remote network/subnet numbers. This is because the private-remote systems are marked as not ARP-able, whereas other routers can be ARPed for (ineligible routers only) by virtue of the designated router. IPLPDN WG [Page 12] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 o When an ineligible router becomes neighbors with the designated router, it must install the SMDS address of the designated router as the ARP address for the network/subnet number representing all routers (smdsARPReq in the SMDS MIB). o Eligible routers must be able to respond to ARP Requests from neighbor routers about neighbor routers. OSPF and RIP Partial-Neighbors Configurations: The following configurations are not recommended for OSPF, in lieu of all-neighbors configuration using designated router election. They may be necessary for RIP, however. Even if all of the routers of a private network cannot join a single LIS, it is still possible to have automatic configuration. This can be done by forming multiple router LISs, where some number of routers on each router LIS belong to more than one router LIS (multi-homed router) in such a way that a connected graph is formed. By con- nected, we mean that there is a path from any router LIS to any other router LIS through a series of zero or more LISs connected by multi- homed routers. The number of "levels" of UARP Replies is equal to the diameter of the graph formed by multi-homed routers (as nodes) and LISs (as links). The only configuration necessary is that of the addresses (IP, SMDS group, and SMDS single). Within each LIS, desig- nated routers are elected in order to reduce overhead (OSPF only). Especially in the early stages of SMDS deployment, there may be cases where two LISs cannot be joined by a multi-homed router. In this case, routers in each LIS must configure a logical point-to-point link with each other. In the worst case, there may be no LISs at all (for instance, because inter-LATA group addressing is not yet avail- able, and there is one router in each LIS). Even in this case, how- ever, logical point-to-point configuration with all other routers can be avoided. This can be done by configuring each router with logical links to a subset of the other routers such that the resulting graph is connected. Note also that the above router operation applies to any routing pro- tocol that can broadcast its routing updates. BGP: Mode 1 BGP Router Operation: Operation of a BGP Router in Mode 1 is straight-forward. External IPLPDN WG [Page 13] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 BGP is used, and is operated as normal [5], using IP encapsulation as described in [1]. The IP and SMDS addresses of the BGP peer are manually configured, and are obtained via means not specified by this memo. Mode 2a Operation: A BGP Router peers with a BGP server exactly as it would with another BGP router (i.e., Mode 1 operation). The difference between Mode 1 and Mode 2 BGP router operation is in how BPG peers are established. In Mode 2, a BGP router must keep a list of BGP servers. Since sub- stantially the same information will be received from all of them, it is only necessary to peer with one BGP server at a time, or two if a hot backup is desired. Therefore, a Mode 2a (and Mode 2b) BGP router needs the ability to choose active peers from its list of BGP servers. Mode 2b Operation: As with the Mode 2a (full) BGP router, a Mode 2b (partial) BGP router must keep a list of BGP servers, and must have an algorithm for choosing active BGP servers. The partial BGP router must addition- ally treat its active BGP server(s) as its default route. In other words, the partial BGP router will send any packets that it doesn't have explicit routing information for to a BGP server. Marking the BGP server as a default is a matter of local configuration. That is, the BGP server will not send the BGP router any indication that it is a default router. Also, the partial router must be viewed as a default router by systems "behind" the BGP router (in the subscriber's network). The partial router must not send routing information it learns from a BGP server to any other routers. The partial router can elect to either receive all BGP information from the BGP servers and choose not to keep it, or the BGP server can be configured (locally) to not send the partial BGP router any rout- ing information at all (except of course for the "on-demand" BGP Updates already described). In any event, the BGP router must send the BGP server its own BGP routing updates. This way, the BGP server can further distribute it to other BGP routers (and servers). BGP Server Operation: Much of the operation of BGP servers is given in the previous para- graphs. In this section, the exchange of BGP information between BGP servers is discussed. There will be more than one BGP server. The reasons are both to spread the load over multiple servers, and to provide backup servers. IPLPDN WG [Page 14] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 Every BGP server is expected to have knowledge of all destinations advertised to all BGP servers. Therefore, the BGP servers must exchange BGP information with each other. To do this, the BGP servers should behave as though they all belong to a single autonomous system. (Strictly speaking, an SMDS service is not an autonomous system, because IP packets can transit the SMDS service without going through a router). That is, they should use external BGP to exchange information with BGP routers, and use inter- nal BGP to exchange information with each other. This means that every BGP server maintains a peer relationship with every other. Note: This requires N**2 BGP server internal relationships. For the near term, and perhaps even long term, I don't think this will be a problem. I think even several hundred BGP servers could be handled. BGP servers are configured to pass the NEXT_HOP and SNPA_NEXT_HOP fields untouched, both when they advertise updates internally and externally. When BGP servers advertise updates externally, they should append an AS number representing the SMDS service to the AS_PATH. General Discussion While I have taken my best shot at coming up with clean and efficient solutions to the problems of discovery and routing over SMDS, there are several possible options to the techniques discussed in this memo that should be considered. This discussion is not meant to be included in the final RFC. The UARP Reply is used as a redirect mechanism mainly because it con- tains the hardware address. It might be better to use a whole new ICMP message to convey this information. The new message would con- tain the same information as the UARP Reply, but would have a dif- ferent ICMP message number, and therefore would be distinguishable from the ARP Reply. There is an interesting mechanism for discovery over SMDS that I con- sidered but chose not to incorporate. With SMDS, it is possible for a system to send messages to group addresses that it is not a member of. This means that if a system were configured with a list of the group addresses for each LIS on its private network, it could ARP for things not on its own LIS. I decided not to do this for several reasons. First, it didn't elim- inate the need for the UARP Reply mechanism, because initially one may not be able to do group addressing across LATA boundaries. IPLPDN WG [Page 15] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 Second, the idea of non-symmetric ARP groups makes me very uncomfort- able. For instance, it seems that it would require that each system have a logical interface and associated IP address and mask for each LIS that it might ARP on. But these addresses would be unusable for sending and receiving packets, and in fact things would break if those addresses were known outside the system that owned them. Either that, or it would be necessary to modify the systems so that they could ARP over something that they could not match up against one of their interfaces' network/subnet numbers. But this again is drifting too far from the fundamental meaning of IP addresses for my comfort. In general I am not completely comfortable with the material in this memo. To me, there are too many kludges in it--kludging addresses over logical interfaces so that a system knows that something else is reachable over the network, and kludging the ARP Reply so that a sys- tem can efficiently learn the SMDS addresses of other systems. Another approach to this whole problem would be to create an SMDS- wide ARP service, or at least an ARP service that could handle all ARPing for a private network. However, this solution required a whole new distributed algorithm for the purposes of collecting and disseminating the ARP requests and replies. Since the routing algo- rithm already has most of the information needed at hand, it seems overly expensive to create a new algorithm to handle ARPing. Also, many of the addressing weirdness didn't seem to get completely resolved even with an ARP service (unless the use of ARPing over group addresses was limited to ARP servers only). REFERENCES [1] Piscitello, D., Lawrence, J., "The Transmission of IP Datagrams over the SMDS Service", RFC 1209, USC/Information Sciences Insti- tute, March 1991. [2] Mogul, J., Postel, J., "Internet Standard Subnetting Procedure", RFC-950, USC/Information Sciences Institute, August, 1985. [3] Braden, R.T.,ed., "Requirements for Internet hosts - communication layers", RFC-1122, USC/Information Sciences Institute, October, 1989. [4] Braden, R.T., Postel, J.B., "Requirements for Internet gateways", IPLPDN WG [Page 16] INTERNET-DRAFT V1 SMDS Routing and Discovery June 1991 RFC-1009, USC/Information Sciences Institute, June, 1987. [5] Lougheed, K.; Rekhter, Y., "A Border Gateway Protocol 3 (BGP-3)", Internet-draft, January 1991. [6] Moy, J., "OSPF specification", RFC-1131, USC/Information Sciences Institute, October, 1989. [7] Tesink, K. ed., "Definitions of Managed Objects for the SIP Inter- face Type", Internet-draft, March 1991. [8] Tsuchiya, P.T., "Border Gateway Protocol NEXT_HOP_SNPA Attribute", Internet-draft, March 1991. [9] Hedrick, C., "Routing Information Protocol", RFC-1058, USC/Information Sciences Institute, June, 1988. IPLPDN WG [Page 17]