Internet Draft Robert L. Ullmann draft-ietf-tpix-catnip-base-01.txt Lotus Development Corporation 22 December 1993 CATNIP Common Architecture Technology for Next-generation Internet Protocol 1 Status of this memo This memo describes a common architecture for the network layer protocol. The first version of this memo, describing a possible Internet Version 7 protocol was written by the present author in the summer and fall of 1989, and circulated informally, including to the IESG, in December 1989. Informal notes on addressing, called "Toasternet Part I and II", were circulated on the IETF mail list during November 1991 and March 1992. Subsequent work was published in June 1993 in RFCs 1475 and 1476. It has since evolved, moving (for example) from varying length addressing to a fixed length format and the back to an ISO varying address format. Much of the thinking was paralleled by work done by Ross Callon under the name TUBA, and converged into the present document. (TUBA is, at this time, a separate development effort within the IETF; the present author is entirely responsible for the content of this document if blame is to be assigned; credit must go to many others.) The first version of TUBA was published in RFC 1347. This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. This draft is a product of the TP/IX (and possibly TUBA) working group(s). Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. Ullmann Expires: 22 July 1994 [Page 1] Internet Draft CATNIP 22 December 1993 2 Table of Contents 1 Status of this memo 1 2 Table of Contents 2 3 Introduction 5 3.1 Objectives 5 3.1.1 Incremental Infrastructure Deployment 6 3.1.2 No Address Translation 7 3.1.3 No Legacy Systems 7 3.1.4 Limited Scope 7 3.2 Philosophy 7 3.3 Terminology 7 3.4 Overview of This Document 8 4 Network Layer 9 4.1 Addresses and Network Numbers 9 4.2 One Numbering System 9 4.3 Network Layer Address Format 10 4.4 Network layer datagram format 11 4.4.1 NLPID/Flags 12 4.4.1.1 Destination Address Omitted 12 4.4.1.2 Source Address Omitted 12 4.4.1.3 Report Fragmentation Done 12 4.4.1.4 Mandatory Router Option 13 4.4.2 Header Length 13 4.4.3 Time to live 13 4.4.4 Forward cache identifier 13 4.4.5 Datagram Length 13 4.4.6 Transport Protocol 13 4.4.7 Checksum 14 4.4.8 Destination 14 4.4.9 Source 14 4.4.10 Options 14 4.5 Option Format 14 4.5.1 Class (C) 14 4.5.2 Copy on Fragmentation (F) 15 4.5.3 Type 15 4.5.4 Length 15 4.5.5 Option data 15 4.6 Options 16 4.6.1 Null 16 4.6.2 Fragment 16 4.6.3 Last Fragment 17 4.6.4 Don't Fragment 17 4.6.5 Don't Convert 17 4.7 Forward Cache Identifier 17 4.7.1 Using ICMP Feedback to Provide FCIs 18 4.7.2 Using a Routing Protocol to Provide Ullmann Expires: 22 July 1994 [Page 2] Internet Draft CATNIP 22 December 1993 FCIs 18 4.7.3 Flows 19 4.7.4 Circuits 20 4.7.5 Mobile Hosts 20 4.8 Network Layer Conversion 21 4.8.1 Fragmented Datagrams 21 4.8.2 Where Does the Conversion Happen? 21 4.8.3 Forwarding and Redirects 22 4.8.4 Design Considerations 22 5 OSI Connectionless Protocol 23 5.1 Network Entity Titles 23 5.2 NPDU Format 23 5.3 Conversion from CLNP 23 5.4 Conversion to CLNP 23 6 Internet Protocol 25 6.1 Addressing and ADs 25 6.2 Version 4 IP Address Extension Option 25 6.3 IP Version 7 Datagram Format 26 6.3.1 Hybrid IPv4 Systems 27 6.4 Conversion from IPv4 28 6.4.1 Conversion to IPv4 29 7 Novell IPX 30 7.1 IPX Network Numbering 30 7.2 IPX Transport Control Field 31 7.3 Intermediate header 31 7.3.1 Destination Socket 31 7.3.2 Source Socket 31 7.3.3 Remainder of the TPDU Header 32 7.4 Conversion from IPX 32 7.5 Conversion to IPX 32 8 Transport Protocols 34 8.1 Internet Control Message Protocol 35 8.1.1 ICMP Header Format 35 8.1.2 Conversion Failed ICMP Message 35 8.1.3 ICMP Conversion 37 8.2 Internet Transmission Control Protocol 37 8.2.1 TCP Checksum 37 8.2.2 Maximum Segment Size in TCP 37 8.3 Internet User Datagram Protocol 38 8.3.1 UDP Checksum 38 8.4 OSI TP4 38 8.5 OSI CLTP 38 8.6 Novell Internetwork Packet Exchange 38 8.7 Novell Sequenced Packet Exchange 38 8.7.1 SPX-II 38 Ullmann Expires: 22 July 1994 [Page 3] Internet Draft CATNIP 22 December 1993 9 Notes 39 9.1 MTU discovery 39 9.2 RAP 39 9.3 Internet DNS 39 9.3.1 PTR zone 39 9.3.2 Implementation 39 10 References 40 11 Author's Address 42 Ullmann Expires: 22 July 1994 [Page 4] Internet Draft CATNIP 22 December 1993 3 Introduction The common architecture described in this document provides a compressed form of the existing network layer protocols. Each compression is defined so that the resulting network protocol data units are identical in format. The fixed part of the compressed format is 16 bytes in length, and may often be the only part transmitted on the subnetwork. With some attention paid to details, it is possible for a transport layer protocol (such as TCP) to operate properly with one end system using one network layer (e.g. IP version 4) and the other using some other network protocol, such as CLNP. All of the existing transport layer protocols used on connectionless mode network services will operate over the common infrastructure. The architecture uses cache handles, carried in the fixed part of the network layer header, to provide both rapid identification of the next hop in high performance routing as well as abbreviation of the network header by permitting the addresses to be omitted when a valid cache handle is available. The cache handles are either provided by feedback from the downstream router in response to offered traffic, or explicitly provided as part of the establishment of a circuit or flow through the network. When used for flows, the handle is the locally significant flow identifier. When used for circuits, the handle is the layer 3 peer to peer logical channel identifier, and permits a full implementation of network layer connection oriented service if the routers along the path provide sufficient features. At the same time, the packet format of the connectionless service is retained, and hop by hop fully addressed datagrams can be used at the same time. Any intermediate model between the connection oriented and the connectionless service can thus be provided over cooperating routers. 3.1 Objectives The first objective of CATNIP is a practical recognition of the existing state of internetworking, and an understanding that any approach must encompass the entire problem. While it is common in the IP Internet to dismiss CLNP, with various amusing phrases, it is hardly realistic. (Although great fun sometimes: "IS-IS = 0", "The Giant Leap Sideways", "OSIfied networking") Even though IP systems apparently outnumber CLNP, it isn't going away. Which is fortunate for the IP cheerleaders: were a decision to be made on the size of the installed base, the winner would be IPX, with installed systems far outnumbering IP and CLNP combined. And then there is SNA, with probably has a larger installed base in terms of capital cost than IPX. IP is in third place. Ullmann Expires: 22 July 1994 [Page 5] Internet Draft CATNIP 22 December 1993 CATNIP is designed to integrate CLNP, IP, and IPX. The architecture of SNA leads more toward providing SNA tunnels through the common architecture; there isn't any way to do network layer alignment. (It isn't clear that there is a network layer in SNA, given the classic OSI and Internet definitions of that term.) The CATNIP design provides for any of the transport layer protocols in use, for example TP4, CLTP, TCP, UDP, IPX and SPX to run over any of the network layer protocol formats: CLNP, IP (version 4), IPX, and the CATNIP. 3.1.1 Incremental Infrastructure Deployment The best use of the CATNIP is to begin to build a common Internet infrastructure. The routers and other components of the common system are able to use a single consistent addressing method, and common terms of reference for other aspects of the system. CATNIP is designed to be incrementally deployable in the strong sense: you can plop a CATNIP system down in place of any existing network component and continue to operate normally with no reconfiguration. (Note: not "just a little". None at all. The number of "little changes" suggested by some proposals, and the utterly enormous amount of documentation, training, and administrative effort then required, astounds the present author.) The vendors do all of the work. There are also no external requirements, no "border routers", no requirement that administrators apply specific restrictions to their network designs, define special tables, or add things to the DNS. Eventually with full understanding of the combined system the end users and administrators will want to operate differently, but in no case, not even in small ways, will they be forced. Networks and end user organizations operate under sufficient constraints on deployment of systems anyway; they do not need a new network architecture adding to the difficulty. Typically deployment will occur as part of normal upgrade revisions of software, and due to the "swamping" of the existing base as the network grows. (When the Internet grows by a factor of 5, at least 80% will then be "new" systems.) The users of the network may then take advantage of the new capabilities. Some of the performance improvements will be automatic, others may require some administrative understanding to get to the best performance level. The CATNIP definitions provide stateless conversion of network datagrams to and from CATNIP and by implication directly between CLNP and the other network layer protocols. A CATNIP capable system implementing the full set of definitions will be able to interoperate with any of the existing protocols. Various subsets of the full capability may be provided by some vendors. Ullmann Expires: 22 July 1994 [Page 6] Internet Draft CATNIP 22 December 1993 3.1.2 No Address Translation Note that there is no "address translation" in the CATNIP specification. (While it may seem odd to state a negative objective, this is worth saying as people seem to assume the opposite.) There are no "mapping tables", no magic ways of digging translations out of the DNS or X.500, no routers looking up translations or asking other systems for them. Addresses are modified with a simple algorithmic mapping, a mapping that is no more than using specific prefixes for IP and IPX addresses. Not a large set of prefixes; one prefix. The entire existing IP version 4 network is mapped with one prefix and the IPX global network with one other prefix. (The IP mapping does provide for future assignment of other IANA/IPv4 domains, disjoint from the existing one.) This means that there is no immediate effect on addresses embedded in higher level protocols. Higher level protocols not using the full form (those native to IP and IPX) will eventually be extended to use the full addressing to extend their usability over all of the network layers. 3.1.3 No Legacy Systems The CATNIP leaves no systems behind: any system presently capable of IP, CLNP, or IPX retains at least the connectivity it has now with no reconfiguration. With some administrative changes (such as assigning IPX domain addresses to some CLNP hosts for example) on other systems, unmodified systems may gain significant connectivity. IPX systems with registered network numbers may gain the most. 3.1.4 Limited Scope This specification defines a common network layer packet format and basic architecture. It intentionally does not specify ES-IS methods, routing, naming systems, autoconfiguration and other subjects not part of the core Internet wide architecture. The related problems and their (many) solutions are not within the scope of the specification of the basic common packet format. There are some related issues discussed in the last section. 3.2 Philosophy Protocols should become simpler as they evolve. "Perfection is attained not when there is nothing left to add, but when there is nothing left to take away." 3.3 Terminology Ullmann Expires: 22 July 1994 [Page 7] Internet Draft CATNIP 22 December 1993 The following specification attempts to use simple terminology were possible. Words like address, route, flow, circuit, and mobile are used with specific abstract concepts in mind that can differ from other uses of the same word. In Internet specifications, this is seen as generally preferable to the alphabet soup typical of OSI specification: we prefer datagram to NPDU, even at the risk of being mistaken for the datagrams of the UDP, which is a different animal. But it requires some care on the part of the reader. This isn't unique to networking of course. Linnaeus invented a system for naming other sorts of flora and fauna, giving us wonderful terms like Drosophila Melanogaster (or Nepeta Cataria) for times when one wants to be pedantic. When one doesn't, one can use ordinary common terms, as long as one is willing to duck the flying fruit thrown by those who misunderstood. 3.4 Overview of This Document Section 4 describes the common architecture and network layer addressing and packet format from a point of view independent of the network layer protocols. Section 5 specifies the detailed use of the common format to compress CLNP NPDUs and take advantage of the cache architecture. Section 6 details the use of the common architecture to support IP (version 4) in an extended version in the common internet. Section 7 describes using the common format to support IPX internetworking. Section 8 discusses the various transport protocols, and the fine details of ensuring that they work directly over the CATNIP infrastructure, as well as over the network protocols other than their "native" protocol. Section 9 is a small collection of notes on higher level protocols and details that in a strict sense are out of the scope of the network layer specification. Ullmann Expires: 22 July 1994 [Page 8] Internet Draft CATNIP 22 December 1993 4 Network Layer 4.1 Addresses and Network Numbers The Internet's version 4 numbering system has proven to be very flexible, (mostly) expandable, and simple. In short: it works. There are two problems, neither serious when this specification was first developed in 1988 and 1989, but have as expected become more serious: o The division into network, and then subnet, is insufficient. Almost all sites need a network assignment large enough to subnet. At the top of the hierarchy, there is a need to assign administrative domains. o As bit-packing is done to accomplish the desired network structure, the 32 bit limit causes more and more aggravation. Another major addressing system used in open internetworking is the OSI method of specifying Network Service Access Points (NSAPs). The NSAP consists of an authority and format identifier, a number assigned to that authority, an address assigned by that authority, and a selector identifying the next layer (transport layer) protocol. This is actually a general multi-level hierarchy, often obscured by the details of specific profiles. (For example, CLNP doesn't specify 20 octet NSAPs, it allows any length. But various GOSIPs profile the NSAP as 20 octets, and IS-IS makes specific assumptions about the last 1-8 octets. And so on.) The NSAP does not directly correspond to an IP address, as the selector in IP is separate from the address. The concept that does correspond is the NSAP less the selector, called the Network Entity Title or NET. (An unfortunate acronym, but one we will use to avoid repeating the full term.) The actual strict definition of NET is an NSAP with the selector set to 0; the NET used here omits the 0 selector. There is also a network numbering system used by IPX, a product of Novell, Inc. (which will be referred to from here on as simply Novell) and other vendors making compatible software. While IPX is not yet well connected into a global network, it has a larger installed base than either of the other network layers. 4.2 One Numbering System Given the several systems in use, it is absurd to try to resolve the differences by introducing another. (Absurd only after much consideration; this is not to be construed to mean that it was obvious.) The differing systems already cause serious problems in the administration of networks; introducing another is not appropriate as long as an existing system can serve or be extended to serve. This leads to two possible paths. Ullmann Expires: 22 July 1994 [Page 9] Internet Draft CATNIP 22 December 1993 o One path is to extend the existing version 4 addressing in a logical manner, possibly to a 64 bit or longer fixed length address. This is the approach taken in the first published version of IPv7, described in RFC1475. One problem with this approach is that the result is not usable for CLNP or the OSI CONS without major modifications to those protocols. o The other path, similar to the development work preceding RFC1475 on version 7, (and, interestingly, very similar to the ideas developed independently in the IAB proposal of June 1992), is to incorporate the version 4 addressing into the OSI NET addressing in such a way that it is usable for version 7 as well as both CONS and CLNS. The second path, leading to a single addressing plan for the Internet, OSI, and Novell protocols is described in this document. A similar approach can be used to integrate other network layer protocols (of sufficient generality) into the common architecture. 4.3 Network Layer Address Format The network layer address looks like: +----------+----------+---------------+---------------+ | length | AFI | IDI ... | DSP ... | +----------+----------+---------------+---------------+ The fields are named in the usual OSI terminology although that leads to an oversupply of acronyms. A more detailed description of each field: length the number of bytes (octets) in the remainder of the address. AFI the Authority and Format Identifier. A single byte value, from a set of well-known values registered by ISO, that determines the semantics of the IDI field IDI the Initial Domain Identifier, a number assigned by the authority named by the AFI, formatted according to the semantics implied by the AFI, that determines the authority for the remainder of the address. DSP Domain Specific Part, an address assigned by the authority identified by the value of the IDI. Note that there are several levels of authority: ISO identifies (with the AFI) a set of numbering authorities (like X.121, the numbering plan for the PSPDN, or E.164, the numbering plan for the telephone system). Each authority numbers a set of organizations Ullmann Expires: 22 July 1994 [Page 10] Internet Draft CATNIP 22 December 1993 or individuals or other entities. (For example, E.164 assigns 16172477959 to me as a telephone subscriber.) The entity then is the authority for the remainder of the address. I can do what I please with the addresses starting with (AFI=E.164) (IDI=1617247959). Note that this is a delegation of authority, and not (as is often erroneously concluded) an embedding of a data-link address (the telephone number) in a network layer address. The actual routing of the network layer address has nothing to do with the authority numbering. The domain specific part is variable length, and can be allocated in whatever way the authority identified by the AFI+IDI desires. (But note that things like GOSIPs and ES-IS as presently implemented put other, probably ill-advised, constraints on the DSP.) 4.4 Network layer datagram format The common architecture format for network layer datagrams is described below. The design is a balance between use on high performance networks and routers and a desire to minimize the number of bits in the fixed header. One mistake that will not be made is to make a fixed field too small. Using the current state of processor technology as a reference, the fixed header is all loaded into CPU registers on the first memory cycle, and all fits within the operation bandwidth. The header leaves the remaining data aligned on the header size (128 bits); with 64 bit addresses present and no options it leaves the transport header 256 bit aligned. Other things: the FCI precedes the length and transport protocol, being needed as early as possible after format identification. The checksum is at the end of the fixed part, being updated last. (These may not be important, given the likelihood that it is all going to be loaded in parallel anyway.) And so on. On very slow and low performance networks, it is still fairly small, and could be further compressed by methods similar to those used with IP version 4 on links that consider every bit precious. In between, it fits nicely into ATM cells and radio packets, leaving sufficient space for the transport header and application data. Ullmann Expires: 22 July 1994 [Page 11] Internet Draft CATNIP 22 December 1993 +-------+-+-+-+-+---------------+-------------------------------+ | NLPID |D|S|R|M| Header Size | Time To Live | +-------+-+-+-+-+---------------+-------------------------------+ | Forward Cache Identifier | +---------------------------------------------------------------+ | Datagram Length | +---------------------------------------------------------------+ | Transport Protocol | Checksum | +---------------------------------------------------------------+ | Destination Address ... | +---------------------------------------------------------------+ | Source Address ... | +---------------------------------------------------------------+ | Options ... | +---------------------------------------------------------------+ 4.4.1 NLPID/Flags The top part of the first byte (the network layer protocol identifier in OSI) is a four bit constant 0111. The lower part of the first byte is four bit flags. These are positioned with the NLPID and header size so that an implementation can use known values of the first 16-bit word to profile further processing of the datagram. This uses a block of 16 NLPID values, in the joint ISO/CCITT range. While this is technically good, it will need to be re-visited from a political viewpoint. 4.4.1.1 Destination Address Omitted When the destination address omitted (DAO) flag is zero, the destination address is present as shown in the datagram format diagram. When a datagram is sent with an FCI that identifies the destination and the DAO flag is set, the address does not appear in the datagram. 4.4.1.2 Source Address Omitted The source address omitted (SAO) flag is zero when the source address is present in the datagram. When datagram is sent with an FCI that identifies the source and the SAO flag is set, the source address is omitted from the datagram. 4.4.1.3 Report Fragmentation Done When this bit (RFD) is set, an intermediate router that fragments the datagram (because it is larger than the next subnetwork MTU) should report the event with an ICMP Datagram Too Big message. (Unlike IP Ullmann Expires: 22 July 1994 [Page 12] Internet Draft CATNIP 22 December 1993 version 4, which uses DF for MTU discovery, RFD allows the fragmented datagram to be delivered.) 4.4.1.4 Mandatory Router Option The mandatory router option (MRO) flag indicates that routers forwarding the datagram must look at the network header options. If not set, an intermediate router should not look at the header options. (But it may anyway; this is a necessary consequence of transparent network layer conversion, which may occur anywhere.) The destination host, or an intermediate router doing conversion, must look at the header options regardless of the setting of MRO. A router doing fragmentation will normally only use the F field within options to determine whether options should be copied within the fragmentation code path. (It might also recognize and elide null options.) If MRO is not set, the router may not act on an option even though it copies it properly during fragmentation. If there are no options present, MRO should always be zero, so that routers can follow the no-option profile path in their implementation. (Remember that the presence of options cannot be divined from the header length, since the addresses are variable length.) 4.4.2 Header Length The header length is a 8-bit count of the number of 32 bit words in the header. This allows a header to be up to 1020 bytes in length. 4.4.3 Time to live The time to live is a 16-bit count, nominally in 1/16 seconds. Each hop is required to decrement TTL by at least one. 4.4.4 Forward cache identifier The identifier provided by the next hop router via ICMP or a routing protocol. The next hop router uses it to find the following hop. (A more complete description is given below.) If an FCI is not available, this field must be zero, the SAO and DAO flags must be clear, and both destination and source addresses must appear in the datagram. 4.4.5 Datagram Length The 32-bit length of the entire datagram in octets. A datagram can therefore be up to 4294967295 bytes in overall length. Particular networks normally impose lower limits. 4.4.6 Transport Protocol Ullmann Expires: 22 July 1994 [Page 13] Internet Draft CATNIP 22 December 1993 The transport layer protocol. For example, TCP is 6. 4.4.7 Checksum The checksum is a 16-bit checksum of the entire header, using the familiar algorithm used in IP version 4. 4.4.8 Destination The destination address, a count byte followed by the destination NET with the zero selector omitted. This field is present only if the DAO flag is zero. If the count field is not 3 modulo 4 (the destination is not an integral multiple of 32-bit words) zero bytes are added to pad to the next multiple of 32 bits. These pad bytes are not required to be ignored: routers may rely on them being zero. 4.4.9 Source The source address, in the same format as the destination. Present only if the SAO flag is zero. The source is padded in the same way as destination to arrive at a 32-bit boundary. 4.4.10 Options Options may follow. They are variable length, and always 32 bit aligned. If the MRO flag in the header is not set, routers will usually not look at or take action on any option, regardless of the setting of the class field. 4.5 Option Format Each option begins with a 32-bit fixed header, followed by the option data and zero padding if needed: +---+-+-------------------------+-------------------------------+ | C |F| Option Type | Data Length | +---+-+-------------------------+---------------+---------------+ | Option Data | Padding | +-----------------------------------------------+---------------+ A description of each field: 4.5.1 Class (C) This two-bit field tells implementations what to do with datagrams that contain options the implementations do not understand. This specification does not require an implementation is required to implement (i.e. understand) any particular option. Classes: Ullmann Expires: 22 July 1994 [Page 14] Internet Draft CATNIP 22 December 1993 0 use or forward and include this option unmodified 1 use this datagram, but do not forward the datagram 2 discard, or forward and include this option unmodified 3 discard this datagram A host receiving a datagram addressed to itself will use it if there are no unknown options of class 2 or 3. A router receiving a datagram not addressed to it will forward the datagram if and only if there are no unknown options of class 1 or 3. (The astute reader will note that the bits can also be seen as having individual interpretations, one allowing use even if unknown, one allowing forwarding if unknown.) Note that classes 0 and 2 are imperative: if the datagram is forwarded, the unknown option must be included. Class and type are entirely orthogonal, different implementations might use different classes for the same option, except where restricted by the option definition. Also note that for options that are known (implemented by) the host or router, the class has no meaning; the option definition totally determines the behavior. (Although it should be noted that the option might explicitly define a class dependent behavior.) 4.5.2 Copy on Fragmentation (F) If the F bit is set, this option must be copied into all fragments when a datagram is fragmented. If the F bit is reset (zero), the option must only be copied into the first (zero-offset) fragment. 4.5.3 Type The Type field (13 bits) identifies the particular option, types being registered as well-known values in the Internet. A few of the options with their types are described below in section 3.6. 4.5.4 Length Length of the option data, in bytes. The offset from the start of this option to the start of the next option is length plus 4, rounded up to a multiple of 4 bytes. 4.5.5 Option data Variable length specified by the length field, plus 0-3 bytes of zeros to pad to a 32-bit boundary. Fields within the option data that are 64 bits long are normally placed on the assumption that the option header is aligned (the usual case when the option is the only one present), and immediately follows the fixed part of the header and the addresses (if present) are the same size. Ullmann Expires: 22 July 1994 [Page 15] Internet Draft CATNIP 22 December 1993 4.6 Options The following sections describe the options defined to provide features of the network layer protocols being represented, or necessary in the basic structure of the protocol. Other options will need to be defined to carry some of the idiosyncrasies of the various network layer services through the common infrastructure to be reproduced on the other side. These are not yet specified. 4.6.1 Null The null option, type 0, provides for a space filler in the option area. The data may be of any size, including 0 bytes (which is perhaps the most useful case). The coding of type, class, fragment, and length are chosen so that an all-zero 32-bit word is interpreted as a null option, 32 bits in overall length. Null may be used to change alignment of the options that follow it or to replace an option being deleted, by setting type to 0 and class to 0, leaving the length and content of the data unmodified. (Note that this implies that options must not contain "secret" data, relying on class 3 to prevent the data from leaving the domain of routers that understand the option.) Null is normally class 0, and need not be implemented to serve its function. 4.6.2 Fragment Fragment (type 1) indicates that the datagram is part of a complete IP datagram. It is always class 2. The data consists of one of the addresses of the router doing the fragmentation, a 64-bit datagram ID generated by that router, and a 32- bit fragment offset. The IDs should be generated so as to be very likely unique over a period of time larger than the TCP MSL (maximum segment lifetime). +---+-+-------------------------+-------------------------------+ | C |F| Type (1 or 2) | Data Length | +---+-+-------------------------+-------------------------------+ | Fragment Offset | +---------------------------------------------------------------+ | | + Datagram ID | | | +---------------+---------------+-------------------------------+ | Address length| Router AFI | Router IDI ... | +---------------+---------------+-------------------------------+ Ullmann Expires: 22 July 1994 [Page 16] Internet Draft CATNIP 22 December 1993 | Router DSP ... | +---------------------------------------------------------------+ If a datagram must be re-fragmented, the original address and ID are preserved, so that the datagram can be reassembled from any sufficient set of the resulting fragments. A router implementing Fragment (doing fragmentation) must recognize the Don't Fragment option. 4.6.3 Last Fragment Last Fragment (type 2) has the same format as Fragment, but implies that this datagram is the last fragment needed to reassemble the original datagram. Note that an implementation can reasonably add arriving datagrams with Fragment to a cache. It can then attempt a reassembly when a datagram with Last Fragment arrives (and the total length is known). This will work well when datagrams are not reordered in the network. 4.6.4 Don't Fragment This option (type 3, class 0) indicates that the datagram may not be fragmented. If it can not be forwarded without fragmentation, it is discarded, and the appropriate ICMP message sent. (Unless, of course, the datagram is an ICMP message.) There is no data field in the Don't Fragment option. 4.6.5 Don't Convert The Don't Convert option prohibits conversion from the common format to IP version 4, CLNP or IPX, requiring instead that the datagram be discarded and an ICMP message sent (Conversion Failed/Don't Convert Set). It is type 4, usually class 0, and must be implemented by any router implementing conversion. A host is under no such constraint; like any protocol specification, only the "bits on the wire" can be specified, the host receiving the datagram may convert it as part of its procedure. There is no data present in this option. 4.7 Forward Cache Identifier Each datagram carries a 32 bit field, called "forward cache identifier", that is updated (if the information is available) at each hop. This field's value is derived from ICMP messages sent back by the next hop router, a routing protocol (e.g. RAP), or some other method. The FCI is used to expedite routing decisions by preserving knowledge where possible between consecutive routers. It can also be used to make Ullmann Expires: 22 July 1994 [Page 17] Internet Draft CATNIP 22 December 1993 datagrams stay within reserved flows, circuits, and mobile host tunnels. 4.7.1 Using ICMP Feedback to Provide FCIs 4.7.2 Using a Routing Protocol to Provide FCIs Consider 3 routers, A, B, and C. Traffic is passing through them, between two other hosts (or networks), X and Y. Packets are going XABCY and YCBAX. Consider only one direction: routing information flowing from C to A, to provide a route from A to C. The same thing will be happening in the other direction. An explanation of the notation: R(r,d,i,h) A route that means: "from router r, to go toward final destination d, replace the forward route identifier in the packet with i, and take next hop h." Ri(r,d) An opaque (outside of router r) identifier, that can be used by r to find R(r,d,...). Flowi(r,rt) An opaque (outside of router r) identifier, that router r can use to find a flow or tunnel with which the datagram is associated, and from that the route rt on which the flow or tunnel is built, as well as the Flowi() for the subsequent hop. Ri(Dgram) The forward route identifier in a datagram. One possible sequence of events: o Router C announces a route R(C,Y,0,Y) to router B. It includes an identifier Ri(C,Y) internal to C, allows C to find the route rapidly. (The identifier may be a table index, or an actual memory address.) o Router B creates a route R(B,Y,Ri(C,Y),C) via router C and announces it to A. The route includes an identifier Ri(B,Y), internal to B, and used by A as an opaque object. o Router A creates a route R(A,Y,Ri(B,Y),B) via router B. It has no one to announce it to. (Poor thing.) o Now: X originates a datagram addressed to Y. It has no routing information, and sets Ri(Dgram) to zero. It forwards the datagram to router A (X's default gateway). o A finds no valid Ri(Dgram), and looks up the destination (Y) in its routing tables. It finds R(A,Y,Ri(B,Y),B), sets Ri(Dgram) to Ri(B,Y), and forwards the datagram to B. Ullmann Expires: 22 July 1994 [Page 18] Internet Draft CATNIP 22 December 1993 o Router B looks at Ri(Dgram) which directly identifies the next hop route R(B,Ri(C,Y),C), sets Ri(Dgram) to Ri(C,Y) and forwards it to router C. o Router C looks at Ri(Dgram) which directly locates R(C,0,Y), sets Ri(Dgram) <- 0 and forwards to Y. o Y recognizes its own address in Dest(Dgram), ignores Ri(Dgram). Of course, the routers will validate the Ri's received, particularly if they are memory addresses (e.g. M(a) < Ri < M(b), Ri mod N == 0), and probably check that the route in fact describes the destination of the datagram. If the Ri is invalid, the router must use the ordinary method of finding a route (this is what it would have done if Ri was 0), and silently ignore the invalid Ri. When a route has been implicitly or explicitly aggregated at some router, the router will find that the incoming Ri(Dgram) at most can identify the aggregation, and that it must make a decision. The router inserts into the forwarded datagram the Ri for the specific route. (Note this may happen well upstream of the point at which the routes actually diverge.) This routing procedure allows all cooperating routers to make immediate forwarding decisions, without any searching of tables or caches once the datagram has entered the routing domain. If the host participates in the routing, at least to the extent of acquiring the initial Ri required from the first router, then only routers that have done aggregations need make decisions. (If the routing changes with datagrams in flight, some router will be required to make a decision to re-rail each datagram.) 4.7.3 Flows If a "flow" is to be set up, the identifiers are replaced by Flowi(router,route). In this case, each router's structure for the flow contains a pointer to the route on which the flow is built. Datagrams can drop out of the flow at some point, and can be inserted either by the originating host or by a cooperating router near the originator. Since the forward route identifier field is opaque to the sending router, and implicitly meaningful only to the next hop router, use for flows (or similar optimizations) need not be otherwise defined by the protocol. (This presumes that a router issuing both Ri's and Flowi's will take care to make sure that it can distinguish them by some private method.) If a flow has been set up by (for example) a restricted target RAP route announcement, it looks no different from a route in the implementation. If this announcement originates from the host itself, the Ri in incoming datagrams can be used to determine whether they followed the flow. The Ullmann Expires: 22 July 1994 [Page 19] Internet Draft CATNIP 22 December 1993 incoming Ri can also be used to optimize delivery of the datagrams to the next layer protocol. If the Flow Setup option is included in the route, datagrams can use the DAO option and omit the destination address. 4.7.4 Circuits In a similar manner to flows, a circuit can be established by propagating a route from a destination host to an identified source. This sets up half of a full two-way circuit. Since the two directions are independent, the term circuit is used here to refer only to the establishment of the route in one direction. (Never mind that the origin of the term "circuit" imputes the existence of both.) If the circuit is set up with RAP (it can be set up by other methods), and the Circuit Setup option is included, both the source and destination addresses can be omitted from datagrams transmitted on the circuit. If the circuit is established by a method other than RAP, that method will need to provide some way to specify whether this can be done, i.e., whether all routers in the circuit will support forwarding without the destination address. A datagram traveling on an established circuit may have both SAO and DAO set. If there are also no options present, the entire header is 16 bytes in length. Circuits do not have to be established all the way from source to destination to make this possible. If an intermediate router is the entering endpoint of a circuit, it can insert matching datagram traffic as it arrives while removing the source and destination addresses. A similar operation can insert datagrams into a single destination flow while removing the destination address. If a router is the exit endpoint of a circuit or flow, it must add the addresses into the datagram and clear the corresponding flags. 4.7.5 Mobile Hosts First, a definition: A "mobile host" is a host that can move around, connecting via different networks at different times, while maintaining open TCP connections. It is distinguished from a "portable host", which is simply a host that can appear in various places in the net, without continuity. A portable host can be implemented by assigning a new address for each location (more or less automatically), and arranging to update the domain system. Supporting truly mobile hosts is the more interesting problem. To implement mobile host support in a general way, either some layer of the protocol suite must provide network-wide routing, or the datagrams must be tunneled from the "home" network of the host to its present location. In the real network, some combination of these is probable: most of the net will forward datagrams toward the home network, and then the datagrams will follow a specific host route to the mobile host. Ullmann Expires: 22 July 1994 [Page 20] Internet Draft CATNIP 22 December 1993 The requirement on the routing system is that it must be able to propagate a host route at least to the home network; any other distribution is useful optimization. When a host route is propagated by RAP as a targeted route and the routers use the resulting Ri's, the datagram follows an effective tunnel to the mobile host. (Not a real tunnel, in the strict sense; the datagrams are following an actual route at the network protocol layer.) As explained in RAP, a targeted route can be issued when desired. In particular, it can be triggered by the establishment of a TCP connection or by the arrival of datagrams that do not carry Ri's indicating that they have followed a direct route. The more serious problem with mobile hosts is not finding them or routing to them; that works altogether too well. The problem is authenticating them. 4.8 Network Layer Conversion 4.8.1 Fragmented Datagrams The converting host or router must reassemble datagrams that have been fragmented before conversion. Where the conversion is being done by the destination host (for example, the case of a native CATNIP host receiving IP version 4 datagrams), this is similar to the present fragmentation model. When it is being done by an intermediate router (acting as an internetwork layer gateway) the router should use all of source, destination, and datagram ID for identification of fragments. Note that destination is used implicitly in the usual reassembly at the destination. If the fragments take different paths through the net, and arrive at different conversion points, the datagram is lost. 4.8.2 Where Does the Conversion Happen? The objective of conversion is to be able to upgrade systems, both hosts and routers, in whatever order desired by their owners. Organizations must be able to upgrade any given system without reconfiguration or modification of any other, and existing hosts must be able to interoperate essentially forever. (Non-CATNIP routers will probably be effectively eliminated at some point, except where they exist in their own remote or isolated corners.) Each CATNIP system, whether host or router, must be able to recognize adjacent systems in the topology that are (only) IP version 4, CLNP, or IPX and call the appropriate conversion routine just before sending the datagram. Digression: I believe CATNIP hosts will get much better performance by Ullmann Expires: 22 July 1994 [Page 21] Internet Draft CATNIP 22 December 1993 doing everything internally on the common format and using conversion to filter datagrams when necessary. This keeps the usual code path simple, with only a "hook" right after receiving to convert incoming datagrams and just before sending to convert as necessary. Routers may prefer to keep datagrams in their incoming version, at least until after the routing decision is made and then doing the conversion only if necessary. In either case, this is an implementation specific decision. 4.8.3 Forwarding and Redirects It may be important for a router to not send ICMP redirects when it finds that it must do a conversion as part of forwarding the datagram. In this case, the hosts involved may not be able to interact directly. The sending host could ignore the redirect, but this results in an unpleasant level of noise as the sequence continually recurs. 4.8.4 Design Considerations The conversions are designed to be fairly efficient in implementation, especially on RISC architectures, assuming they can either do a conditional move (or store), or do a short forward branch without losing the instruction cache. The other conditional branches in the body of the code are usually not-taken out to the failure/discard case. Handling options does involve a loop and a dispatch (case) operation. The options in IP version 4 are more difficult to handle, not being designed for speed on a 32-bit aligned RISCish architecture -- but they do not occur often, except perhaps the address extension option. For CISC machines, the same considerations will lead to fairly efficient code. The conversion code must be extremely careful to be robust when presented with invalid input. In particular, it may be presented with truncated transport layer headers when called recursively from the ICMP conversion. Ullmann Expires: 22 July 1994 [Page 22] Internet Draft CATNIP 22 December 1993 5 OSI Connectionless Protocol 5.1 Network Entity Titles 5.2 NPDU Format 5.3 Conversion from CLNP Conversion from CLNP version 1 to the common architecture NPDU is mostly a matter of moving fields. The steps follow; the order is not necessarily significant. The NPDU must have been reassembled if it had been segmented. o Verify header checksum. o Verify NLPID is 129 (hex 81), version is 1. o Set first octet of CATNIP datagram to 112 (hex 70). o Verify type is DT (data; 11100) o If "Don't Segment" set, set RFD. o Multiply lifetime by 8, store 16 bit result in TTL. o Set FCI to 0 o Copy destination address, without transport selector. o Add 192 modulo 256 to destination selector, store in transport protocol. o Copy source address, without transport selector. o Calculate header length. o Calculate new network header checksum. 5.4 Conversion to CLNP o Verify header checksum. o Verify first octet is 112 to 115 (hex 70-73), i.e. addresses are present. o Set NLPID to 129 (hex 81), version to 1. o If RFD set, set "Don't Segment". o Set type to DT (11100). Ullmann Expires: 22 July 1994 [Page 23] Internet Draft CATNIP 22 December 1993 o Divide TTL by 8, store in lifetime field. If zero discard the datagram and send ICMP unreachable. o Copy destination address. o If transport protocol is greater than 255, fail. o Add 64 modulo 256 to transport protocol, put in destination transport selector. o Copy source address, add copy of destination selector. o Compute header length and segment length. o Compute new header checksum. Ullmann Expires: 22 July 1994 [Page 24] Internet Draft CATNIP 22 December 1993 6 Internet Protocol 6.1 Addressing and ADs All existing version 4 numbers are defined as belonging to the Internet by using a new AFI, to be assigned to IANA by the ISO. This document uses 192 at present for clarity in examples; it is to be replaced with the assigned AFI. The AFI specifies that the IDI is two bytes long, containing an administrative domain number. The AD (Administrative Domain), identifies an administration which may be a international authority (such as the existing InterNIC), a national administration, or a large multi-organization (e.g., a government). The idea is that there should not be more than a few hundred of these at first, and eventually thousands or tens of thousands at most. Most individual organizations would not be ADs. In the short term, ADs are known to the "core routing"; it pays to keep the number smallish, a few thousand given current routing technology. In the long term, this is not necessary. Big administrations (i.e. with tens of millions of networks) get small blocks where needed, or additional single AD numbers when needed. AD numbers are assigned by IANA. Initially, the only assignment is the number 0.0, assigned to the InterNIC, encompassing the entire existing version 4 Internet. Some ADs (e.g. the InterNIC) may make permanent assignments; others (such as a telephone company defining a network number for each subscriber line) may tie the assignment to such a subscription. But in no case does this require traffic to be routed via the AD. The mapping from/to version 4 IP addresses: +----------+----------+---------------+---------------------+ | length | AFI | IDI ... | DSP ... | +----------+----------+---------------+---------------------+ | 7 | 192 | AD number | version 4 address | +----------+----------+---------------+---------------------+ While the address (DSP) is initially always the 4 byte version 4 address, it can be extended to arbitrary levels of subnetting within the existing Internet numbering plan. Hosts with DSPs longer than 4 bytes will not be able to interoperate with version 4 hosts. 6.2 Version 4 IP Address Extension Option When a datagram is converted to version 4, the AD and extra subnet bytes in the address are moved into an address extension option so that they may be restored if the datagram is converted back. (The datagram may Ullmann Expires: 22 July 1994 [Page 25] Internet Draft CATNIP 22 December 1993 start on a native CATNIP host, be converted to IP version 4 along the way, and end in CLNP, or vice versa.) +---------------+---------------+-------------------------------+ | Type (147) | Length | Source AD | +---------------+---------------+---------------+---------------+ | Destination AD | Source count | Source subnet | +---------------+---------------+---------------+---------------+ | (cont.) bytes | Dest. count | Dest. subnet bytes | +---------------+---------------+-------------------------------+ The source and destination are in this order, with source first, for consistency with version 4. The type code is 147. The additional bytes when the networks are subnetted further than in version 4 (i.e., when the source or destination NET is longer than 7 bytes) are included in counted fields. If both addresses were in the 7 byte form, the option looks like: +---------------+---------------+-------------------------------+ | Type (147) | Length (8) | Source AD | +---------------+---------------+---------------+---------------+ | Destination AD | Src. count (0)| Dest count (0)| +---------------+---------------+---------------+---------------+ Note that even if both ADs are also zero, the option still has meaning: the converting router is to restore the zero ADs to the full addresses, rather than its local AD. This option can be used by version 4 hosts to participate in the extended addressing, even without implementation of any other part of the protocol; see the description of hybrid systems below. 6.3 IP Version 7 Datagram Format The common architecture form of the IP datagram, Internet Version 7, is defined to increase the size of the address field while removing other fields not always used. This results in some simplification, a length less than twice the size of IP even with both extended addresses present, and an expanded space for options. There is a change in the option philosophy from version 4. Version 4 specified that implementation of options was not optional, what was optional was the existence of options in any given datagram. This is changed in version 7: no option need be implemented to be fully conformant. However, implementations must understand the option classes; and a future Host Requirements specification for hosts and routers used in the "connected Internet" may require some options in its profile, for example, Fragment would probably be required. Digression: In IPv4, options are often "considered harmful". It is the Ullmann Expires: 22 July 1994 [Page 26] Internet Draft CATNIP 22 December 1993 opinion of the present author that this is because they are rarely needed, and not designed to be processed rapidly on most architectures. This leads to little or no attempt to improve performance in implementations, while at the same time enormous effort is dedicated to optimization of the no-option case. The network layer datagram looks like this: +-------+-+-+-+-+---------------+-------------------------------+ |Version|D|S|R|M| Header Size | Time To Live | +-------+-+-+-+-+---------------+-------------------------------+ | Forward Cache Identifier (0) | +---------------------------------------------------------------+ | Datagram Length | +-------------------------------+-------------------------------+ | Transport Protocol | Checksum | +---------------+---------------+-------------------------------+ | Dest Len (7) | Dest AFI (192)| Destination AD (0.0) | +---------------------------------------------------------------+ | Destination Address | +---------------------------------------------------------------+ | Src Len (7) | Src AFI (192) | Source AD (0.0) | +---------------------------------------------------------------+ | Source Address | +---------------------------------------------------------------+ | Options ... | +---------------------------------------------------------------+ The version number is 7. The usual version 4 default TTL is scaled by a factor of 16 into a larger number of hops. This is desirable because the forward cache architecture enables the construction of simpler, faster switches, and this may cause the network diameter to increase. This definition should allow continuation of the useful (even though not entirely valid) interpretation of TTL as a hop count, while we move to faster networks and routers. (The most familiar use is by "traceroute", which really ought to be directly implemented by one or more ICMP messages.) Source and destination addresses are the version 4 addresses. Other fields are as described in the common architecture. 6.3.1 Hybrid IPv4 Systems In the course of implementing the new common layer, especially in constrained environments such as small terminal servers, it may be useful to implement the IPv4 address extension option directly. This Ullmann Expires: 22 July 1994 [Page 27] Internet Draft CATNIP 22 December 1993 regains connectivity within the extended Internet, permitting the host to reach other administrative domains as well as hosts within extended subnets. It still does not provide the full addressing: IPX hosts and large parts of the CLNP domain will not be reachable. This may be a useful interim step for vendors not prepared to do a major rework of an implementation. A hybrid IPv4 plus address extension system does not have to implement the conversion, it places this onus on its neighbors. The implication of hybrid systems is that it is not valid to assume that a host that appears to have a CATNIP address is a native implementation. 6.4 Conversion from IPv4 Individual steps in the conversion; the order is in most cases not significant. o Verify checksum. o Verify fragment offset is 0, MF flag is 0. o Verify version is 4. o Extend TTL to 16 bits, multiply by 16. o Set forward route identifier to 0. o Set first 4 octets of destination to length (7), AFI (192) and local AD, copy v4 address to next 4 octets. o Do the same mapping for the source address. o If Address Extension option present copy ADs and extra subnet bytes if present into the addresses. o Copy protocol, set high 8 bits to zero. o If DF flag set, set RFD flag. (Do not generate Don't Fragment option.) o Convert other options where possible. If an unknown option with Copy-on-Fragment is found, fail. If Copy-on-Fragment is not set, ignore the option. (I.e., the flag is (ab)used as an indicator of whether the option is mandatory.) o Compute new IP header length. o Compute new overall datagram length. o Calculate new checksum. Ullmann Expires: 22 July 1994 [Page 28] Internet Draft CATNIP 22 December 1993 6.4.1 Conversion to IPv4 The steps to convert to IPv4 follow. Note that the converting router or host is partly in the role of destination host; it checks both bits of class in IP options, and (as in the other direction) must reassemble fragmented datagrams. o Verify checksum. o Verify version is 7. o Set type-of-service to 0 (there may be an option defined, that will be handled later). o If length is greater than (about) 65549, fail. (That number is not a typographical error. Note that the header adds up to 14 bytes more than the corresponding version 4 header in the usual case.) This check is only to avoid useless work, the precise check is later. o Generate an ID (using an ISN based sequence generator, possibly also based on destination or source or both). o Set flags and fragment field to 0. o Divide TTL by 16, if zero, fail (send ICMP Time Exceeded). If greater that 255, set to 255. o If next layer protocol is greater than 255, fail. Else copy. o If first 2 octets of destination are 7, 192, copy bytes 5-8 to destination address, else fail (ICMP code 12). o Do the same mapping for source address. o Generate v4 address extension option. (If enabled; this probably should be a configuration option, should default to on.) o Process options. If any unknown options of class not 0 found, fail. o If Don't Convert option found, fail. o Convert other options where possible, or fail. o Compute new IP header length. This may fail (too large), fail conversion if so. o Compute new overall datagram length. If greater than 65535, fail. Ullmann Expires: 22 July 1994 [Page 29] Internet Draft CATNIP 22 December 1993 o Compute IPv4 checksum. 7 Novell IPX The Internetwork Packet Exchange protocol, developed by Novell based on the XNS protocol (Xerox Network System) has many of the same capabilities as the Internet and OSI protocols. At first look, it appears to confuse the network and transport layers, as IPX includes both the network layer service and the user datagram service of the transport layer, while SPX (sequenced packet exchange) includes the IPX network layer and provides service similar to TCP or TP4. This turns out to be mostly a matter of the naming and ordering of fields in the packets, rather than any architectural difference. The terminology may be a little confusing. Just remember that SPX/IPX does not correspond to TCP/IP in the "obvious" way; rather, IPX is UDP/IP (CLTP/CLNP) and SPX is TCP/IP (TP4/CLNP). The mapping of transport layers over IP version 4 and IPX is not as useful as it might seem because an IPX host does not have an address usable in the Internet version 4 domain, and vice versa. The major objective is accomplished: IPX systems can communicate over the common infrastructure, and a native CATNIP system can implement the IPX and SPX transport layer protocols if it has an IPX domain address assigned to it. A host implementing both IP version 4 and IPX (or implementing CLNP or CATNIP), and having addresses in both domains will be able to use any of TCP, UDP, TP4, CLTP, IPX, or SPX to communicate with any other host. 7.1 IPX Network Numbering IPX uses a 32-bit LAN network number, implicitly concatenated with the 48-bit MAC layer address to form an internet address. Initially, the network numbers were not assigned by any central authority, and thus were not useful for inter-organizational traffic without substantial prior arrangement. There is now an authority established by Novell to assign unique 32-bit numbers and blocks of numbers to organizations that desire inter-organization networking with the IPX protocol. The Novell/IPX authority may be contacted to request assignments by calling +1 408 321 1506 or by sending mail to registry@novell.com. The Novell/IPX numbering plan uses an ICD, to be assigned, to designate an address as an IPX address. This means Novell uses the authority (AFI=47)(ICD=Novell) and delegates assignments of the following 32 bits. An IPX address in the common form looks like: +----------+----------+---------------+---------------------+ | length | AFI | IDI ... | DSP ... | +----------+----------+---------------+---------------------+ | 13 | 47 | Novell ICD | network+MAC address | Ullmann Expires: 22 July 1994 [Page 30] Internet Draft CATNIP 22 December 1993 +----------+----------+---------------+---------------------+ This will always be followed by two bytes of zero padding when it appearsin a common network layer datagram. Note that the socket numbers included in the native form IPX address are part of the transport layer. 7.2 IPX Transport Control Field The IPX concept corresponding to time to live is a field that starts at zero and counts upward, with 16 being considered expired. There is no nominal time associated with each hop in IPX. We use 4 seconds, to give a similar range to existing Internet TTL values. This is a compromise between extending the limited range of the IPX field to the Internet diameter and avoiding extremely large TTL values in the CATNIP. An IPX transport control of 0 (the initial value) corresponds to an Internet version 4 TTL of 64, a CLNP TTL of 128, and a CATNIP TTL of 1024. The limiting value of 16 corresponds to a TTL of zero in the other domains. Some care must be taken in the math at each conversion to ensure that the time to live (best thought of in the nominal real time) actually decreases at each hop. 7.3 Intermediate header Each Novell-class transport protocol has a transport layer data unit beginning with a common header. This contains fields expelled from the network layer header. It is 4 bytes in length: +-------------------------------+-------------------------------+ | Source Socket | Destination Socket | +-------------------------------+-------------------------------+ Note that when a non-IPX transport protocol is converted into IPX, the first two 16 bit words of the TPDU are unceremoniously moved into the "socket" fields in the addresses. If the protocol is TCP or UDP, the effect is serendipitous. (Or would be, had it not been intentional.) For example, if the protocol is TP4 or CLTP, it means that some fields are to be found in odd places in the resulting IPX packet. 7.3.1 Destination Socket Note that "socket" is the IPX terminology; this is a "port number" in Internet terminology (almost, but not quite, the OSI "destination reference"). This is the 16-bit network order ("high-low") socket number left out of the address in the network layer header. SPX also uses a "connection ID", not visible here (it is inside the TPDU header); this is needed because connections are not identified by the full Internet socket-pair concept. 7.3.2 Source Socket Ullmann Expires: 22 July 1994 [Page 31] Internet Draft CATNIP 22 December 1993 The 16-bit source socket number from the IPX source address. 7.3.3 Remainder of the TPDU Header Any other fields in the packet header (i.e. after the source IPX address) then follow the intermediate header in the transport layer header. 7.4 Conversion from IPX As mentioned previously, these conversions are a bit more involved because network and transport layer fields need to be sorted out. o Subtract "transport control" from 16, multiply by 64, store in TTL. o Add 160 modulo 256 to packet type, put in transport protocol. o Set FCI to 0. o Set first 4 bytes of destination to 13, 47, (2 for Novell ICD). o Copy 10 bytes from IPX destination address. o Set next 2 bytes (pad) to 0. o Repeat last 3 steps for source address. o Compute header length (usually 48 bytes). o Copy last 2 bytes of destination to intermediate header. o Copy last 2 bytes of source to intermediate header. o Compute network header checksum. 7.5 Conversion to IPX In converting back to IPX and SPX, the appropriate fields are borrowed from the intermediate header to complete the addresses. o Verify header checksum o Verify version is 7. o Divide TTL by 64, subtract from 16, if less than 0 set to 0, store in "transport control" field. o If transport protocol is greater than 255, fail. Ullmann Expires: 22 July 1994 [Page 32] Internet Draft CATNIP 22 December 1993 o Set IPX packet type to transport protocol plus 96 modulo 256. o If length is greater than 65553, fail. (If length will be greater than destination interface MTU, fail.) o If first 4 bytes of destination are not 13, 47, Novell ICD, fail. o Copy next 10 bytes of destination to IPX destination. o Copy destination socket from intermediate header into destination address. o Repeat last three steps for source. Followed, of course, by copying the remainder of the TPDU into the IPX packet. Ullmann Expires: 22 July 1994 [Page 33] Internet Draft CATNIP 22 December 1993 8 Transport Protocols This section describes specific implications for the various transport layer protocols operating on the CATNIP. ICMP is included here because of its place in the layering. The following table lists some of the transport layer protocols with their assigned numbers. It does not attempt to be complete. IANA holds authority for number assignments. The transport protocol code points native to OSI and IPX are rotated (since all three spaces use similar small numbers) so that all of the transports can be used over any of the underlying network layers. Number Transport layer protocol 1 Internet Control Message Protocol 6 Internet Transmission Control Protocol 17 Internet User Datagram Protocol 160-191 Novell/IPX block 164 Novell Internetwork Packet Exchange 165 Novell Sequenced Packet Exchange 177 Novell NCP 192-255 OSI selector block 192+x OSI TP4 192+x OSI CLTP 256-65535 CATNIP native protocols This table shows the numbers in CATNIP (and IP version 4) space. On IPX, the "packet type" field has values: Type Transport layer protocol 0-31 Novell/IPX block 4 Novell Internetwork Packet Exchange 5 Novell Sequenced Packet Exchange 17 Novell NCP 32-95 OSI selector block 32+x OSI TP4 32+x OSI CLTP 97 Internet Control Message Protocol 102 Internet Transmission Control Protocol 113 Internet User Datagram Protocol On CLNP (to repeat the table once more in a different view), the destination selector is: Selector Transport layer protocol Ullmann Expires: 22 July 1994 [Page 34] Internet Draft CATNIP 22 December 1993 0-63 OSI selector block x OSI TP4 x OSI CLTP 65 Internet Control Message Protocol 70 Internet Transmission Control Protocol 91 Internet User Datagram Protocol 224-255 Novell/IPX block 228 Novell Internetwork Packet Exchange 229 Novell Sequenced Packet Exchange 241 Novell Netware Core Protocol 8.1 Internet Control Message Protocol The ICMP protocol is very similar to ICMP on IP version 4, in some cases not requiring any conversion. The complication is that datagrams are nested within ICMP messages and must be converted. This is discussed later. 8.1.1 ICMP Header Format The ICMP header format is the same as in Internet version 4. +---------------+---------------+-------------------------------+ | Type | Code | Checksum | +---------------+---------------+-------------------------------+ | Type-specific parameter | +---------------------------------------------------------------+ | Type-specific data | +---------------------------------------------------------------+ Type and code are well-known values, defined in [RFC792]. The codes have meaning only within a particular type, they are not orthogonal. The next 32-bit word is usually defined for the specific type, sometimes it is unused. For many types, the data consists of a nested IP datagram (usually truncated) which is a copy of the datagram causing the event being reported. In IPv4, the nested datagram consists of the IP header, and another 64 bits (at least) of the original datagram. For CATNIP, the nested datagram must include the header plus 64 bits of the remaining datagram, and should include the first 256 bytes of the datagram. That is, in most cases where the original datagram was not large, it will return the entire datagram. 8.1.2 Conversion Failed ICMP Message The introduction of network layer conversion requires a new message Ullmann Expires: 22 July 1994 [Page 35] Internet Draft CATNIP 22 December 1993 type, to report conversion errors. Note that an invalid datagram should result in the sending of some other ICMP message (for example, a Parameter Problem message) or the silent discarding of the datagram. This message is only sent when a valid datagram cannot be converted. Note: implementations are not expected to, and should not, check the validity of incoming datagrams just to accomplish this. It simply means that an error detected during conversion that is known to be an actual error in the incoming datagram should be reported as such, not as a conversion failure. +---------------+---------------+-------------------------------+ | Type | Code | Checksum | +---------------+---------------+-------------------------------+ | Pointer to problem area | +---------------------------------------------------------------+ | Copy of datagram that could not be converted ... | +---------------------------------------------------------------+ The type for Conversion Failed is 31. The codes are: 0 Unknown/unspecified error 1 Don't Convert option present 2 Unknown mandatory option present 3 Known unsupported option present 4 Unsupported transport protocol 5 Overall length exceeded 6 Network layer header length exceeded 7 Transport protocol out of range 8 Port conversion out of range 9 Transport header length exceeded 10 (unused) 11 Address not in compatible prefix The use of code 0 should be avoided, any other condition found by implementors should be assigned a new code requested from IANA. When code 0 is used, it is particularly important that the pointer is set properly. The pointer is an offset from the start of the original datagram to the beginning of the offending field. The data is part of the datagram that could not be converted. It must be at least the IP and transport headers, and must include the field pointed to by the previous parameter. For code 4, the transport header is probably not identifiable; the data should include 256 bytes of the original datagram. Ullmann Expires: 22 July 1994 [Page 36] Internet Draft CATNIP 22 December 1993 8.1.3 ICMP Conversion ICMP messages are converted by copying the type and code into the new packet, and copying the other type specific fields directly. If the message contains an encapsulated and possibly truncated datagram, the conversion routine is called recursively to translate it as far as possible. There are some special considerations: o The encapsulated datagram is less likely to be valid, given that it did generate an error of some kind. o The conversion should attempt to complete all fields available, even if some would cause failures in the general case. Note, in particular, that in the course of converting a datagram, when a failure occurs, an ICMP message (conversion failed) is sent; this message itself may immediately require conversion. Part of that conversion will involve converting the original datagram. o Conditions such as overall datagram length too large are not checked. o The addresses generated in the nested conversion may not be sensible if an address extension option is not present and the datagram has strayed from the expected domain. (Not unlikely, given that we know a priori that some error occurred.) o The conversion must be very sure not to make another recursive call if the nested datagram is an ICMP message. (This should not occur, but obviously may.) o It is probably impossible to generate a correct transport layer checksum in the nested datagram. The conversion may prefer to just zero the checksum field. Likewise, validating the original checksum is pointless. It may be best in a given implementation to have a separate code path for the nested conversion, that handles these issues out of the optimized usual path. 8.2 Internet Transmission Control Protocol 8.2.1 TCP Checksum The TCP checksum uses network layer addresses. In a native implementation on the common architecture, the TCP uses the last 4 octets from the address(es), re-aligned to a 16 bit boundary. 8.2.2 Maximum Segment Size in TCP Ullmann Expires: 22 July 1994 [Page 37] Internet Draft CATNIP 22 December 1993 It is probably advisable for IP version 4 implementations to reduce the MSS offered by a small amount where possible, to avoid fragmentation when datagrams are converted to version 7. This arises when version 4 hosts are communicating through the common infrastructure, with the same MTU as the local networks of the hosts. If MTU discovery is used to control the TCP segmentation, this is not necessary, as MTU discovery will make the correct determination of the MTU of the entire path. (See RFC 1191.) 8.3 Internet User Datagram Protocol 8.3.1 UDP Checksum The UDP checksum is similar to the TCP, in using the network layer addresses. As in TCP, hosts using the full common addressing should use only the last 4 octets when computing the UDP checksum. 8.4 OSI TP4 8.5 OSI CLTP 8.6 Novell Internetwork Packet Exchange 8.7 Novell Sequenced Packet Exchange 8.7.1 SPX-II Ullmann Expires: 22 July 1994 [Page 38] Internet Draft CATNIP 22 December 1993 9 Notes 9.1 MTU discovery Note that the ICMP datagram too large message must report the size of the transport layer data unit that can be sent, not the NPDU size. The network layer header size can vary; the source host does not know the size at the router, and the router cannot determine the size of the header as it left the source host. This will cause Internet version 4 hosts doing MTU discovery to use a size somewhat smaller than the maximum possible. 9.2 RAP 9.3 Internet DNS CATNIP addresses are represented in the DNS with the NSAP RR. The data in the resource record is the NSAP, including the zero selector at the end. The zone file syntax for the data is a string of hexadecimal digits, with a period "." inserted between any two octets where desired for readability. For example: ariel IN NSAP C0.0000.82.67.22.96.00 IN A 130.103.34.150 9.3.1 PTR zone The inverse (PTR) zone is .NSAP, with the CATNIP address (reversed). That is, like .IN-ADDR.ARPA, but with .NSAP instead. The octets are represented as hexadecimal numbers, with leading 0's. (Zero is always written as ".00.") This respects the difference in actual authority: the IANA is the authority for the entire space rooted in .IN-ADDR.ARPA. in the version 4 Internet, while in the new Internet it holds the authority only for C0.NSAP. The domain 00.00.C0.NSAP is to be delegated by IANA to the InterNIC. (Understanding that in present practice the InterNIC is the operator of the authoritative root.) 9.3.2 Implementation These mappings should not require administrative work to create new zone files from the existing files. Vendors of DNS software are expected to provide the capability of automatically generating the new zones and RRs from the old, and generate the old from the new where the administrator is defining zones in the new world order. The automatic generation of new from old should default to on, while the generation of old from new should be off by default. Both must be configurable. Ullmann Expires: 22 July 1994 [Page 39] Internet Draft CATNIP 22 December 1993 A host serving zones (as the zone primary) in multiple ADs will not be able to automatically generate new RRs from the old; it must be configured using version 7 addresses in the zone files. Servers acting as secondaries should request both the new and old zones automatically for the PTR zone; if a host is secondary for PTR sub-zones in more than one AD it will need to be configured with the new zone names. 10 References [Chapin93] A. Lyman Chapin, David M. Piscitello. Open Systems Networking. Addison-Wesley, Reading, Massachusetts, 1993. [Perlman92] Radia Perlman. Interconnections: Bridges and Routers. Addison-Wesley. Reading, Massachusetts, 1992. [RFC768] Jon Postel. User Datagram protocol. August, 1980 [RFC791] Jon Postel, editor. Internet Protocol. DARPA Internet Program Protocol Specification, ISI/USC, September, 1981. [RFC792] Jon Postel, editor. Internet Control Message Protocol. DARPA Internet Program Protocol Specification, ISI/USC, September, 1981. [RFC793] Jon Postel, editor. Transmission Control Protocol. DARPA Internet Program Protocol Specification, ISI/USC, September, 1981. [RFC801] Jon Postel, NCP/TCP transition plan. November, 1981. [RFC1058] C. Hedrick. Routing Information Protocol. June, 1988. [RFC1191] J. Mogul, S. Deering. Path MTU Discovery. November, 1990. [RFC1234] D. Provan. Tunneling IPX Traffic through IP Networks. Novell, Inc., June, 1991. [RFC1247] J. Moy. OSPF Version 2. Proteon, Inc., July, 1991. [RFC1287] D. Clark, L. Chapin, V. Cerf, R. Braden, R. Hobby. Towards the Future Internet Architecture. December, 1991. [RFC1323] V. Jacobson, R. T. Braden, D. A. Borman. TCP extensions for high performance. May, 1992. Ullmann Expires: 22 July 1994 [Page 40] Internet Draft CATNIP 22 December 1993 [RFC1335] Z. Wang, J. Crowcroft, Two-tier address structure for the Internet: A solution to the problem of address space exhaustion. May, 1992. [RFC1338] V. Fuller, T. Li, J. Yu, K. Varadhan. Supernetting: an Address Assignment and Aggregation Strategy. June, 1992. [RFC1347] R. W. Callon. TCP and UDP with Bigger Addresses (TUBA), A simple proposal for Internet addressing and routing. June, 1992. [RFC1466] E. Gerich. Guidelines for Managemnet of IP Address Space. Merit, May, 1993. [RFC1475] Robert Ullmann. TP/IX: The Next Internet. Process Software Corporation. June, 1993. [RFC1476] Robert Ullmann. RAP: Internet Route Access Protocol. Process Software Corporation. June, 1993. [Rose90] Marshall T. Rose. The Open Book. Prentice-Hall, Englewood Cliffs, New Jersey, 1990. Ullmann Expires: 22 July 1994 [Page 41] Internet Draft CATNIP 22 December 1993 11 Author's Address Robert Ullmann Lotus Development Corporation 1 Rogers Street Cambridge Massachusetts 02142 USA Phone: +1 617 693 1315 Email: ariel@world.std.com Ullmann Expires: 22 July 1994 [Page 42]