Date: Mon, 29 Jun 92 19:59:20 EDT
From: "William Allen Simpson" <bsimpson@angband.stanford.edu>
Message-Id: <486.bsimpson@angband.stanford.edu>
To: big-internet@munnari.oz.au
Reply-To: bsimpson@angband.stanford.edu
Subject: PIPE
Status: Read

We're having some very long discussions here, but I haven't seen any
proposal that does the few simple things that I would like to see.  My
last attempt to get the group to focus on practical details wasn't
widely received.

I was enthused last year by Noel's routing ideas.  I haven't been able
to get too excited by any of the other proposals, and was leaning toward
DNAT because it seemed "practical".

I finished a contract on June 15th, and set aside a some time to write
a draft which is both practical and implements some of the "ideal" that
Noel first brought to my attention.  It may not get theoretical acclaim,
but it should move us in the right direction.


                PRACTICAL INTERNET PROTOCOL EXTENSIONS (PIPE)

Abstract

Current IP addresses are used for both host identification and routing.
This memo proposes a simple and practical migration to routes using a
variable length "path", which is separate from the final host
identifier.  It extends the concept of "areas" already present in some
routing protocols to reduce the size of current routing tables and
provide for future expansion of the IP address space.

To reduce the size of routing tables, a pair of IP4 header options is
defined to implement area to area source routing.  These variable length
options will be present in both IP4 and future IP(n) headers.  It is hoped
that the simplicity of the extension will ease implementation.

To deal with several perceived limitations with the current IP4 header,
an IP(n) header is defined which is very similar to the IP4 header.  The
similarity should allow IP(n) routers to continue to route IP4 packets
with very little effort.  It is intended that this migration occur over
a period of many years.

The text of this proposal is preliminary, and it is hoped that comments
and extensions will be offered to make this document more complete.


Motivation

This author is convinced that attempts to extend the IP address space by
simply making it larger are outweighed by the explosion of routing table
space needed for the larger size.  Also, a fixed size address is still
subject to future exhaustion, since the address is burdened with the
need to carry the routing information.  Such an address space is likely
to be highly fragmented.  Therefore, a scheme is proposed which
separates the IP address from the IP routing at the inter-domain level,
and can be extended into the intra-domain level.  The proposed routing
mechanism is extemely hierarchical, which facilitates route folding,
and extensible, to adapt to future growth.

In addition, any scheme which requires significantly more network
resources for final implementation is doomed to failure.  Current Domain
Name Service consumes 10% of the bandwidth; doubling this would be a
significant problem.  This proposal requires no more DNS accesses than
currently, and does not require new servers.  New services such as
inter-domain policy routing and mobile routes are handled entirely
within the usual routing table update mechanism.

Finally, this proposal does not require any immediate changes to current
IP practices at any level, does not require deployment be universal, and
provides an extremely gradual migration path to a new IP header.  The
new header is designed to be routed concurrently with older IP by
maintaining most fields in their current relative offsets within the
IP header.


Terms and Concepts

A "Internet Protocol System Address" (IP-SA) is the number
associated with the end producer/consumer of packets.  An actual host
may have several numbers, perhaps corresponding to different network
attachment points, but it is expected that eventually each system will
have a single number.  This number is also called the "IP address".

A "Routing Area" is defined as a region in the internet routing space
which is reachable with a common defined set of quality of service and
security.  The area may contain other areas.  Several area numbers may
be assigned to the same region when there are non-default quality of
services or security provided.

A "Routing Path" is a sequence of areas which defines the connectivity
of an area to a designated root.  The leaf areas in a path MUST NOT
partially overlap with another leaf.  Each set of quality of service or
security creates a separate path.

Consider the following topology:

                              Backbone
        Router A ----------------------------------- Router B
           |                                            |
           + Host 1.2.3.4                               + Host 6.7.8.9
           |                                            |
           + Router AC ----+------------+---- Router BC +
           |               |            |
           + Host 4.3.2.1 -+            +- Host 9.8.7.6


The default path to these hosts would be defined as:

                0:RA                            0:RB
                A:RA                            B:RB
                A:1.2.3.4                       B:6.7.8.9
                A:4.3.2.1                       B:RBC
                A:RAC                           B.C:RBC
                A.C:RAC                         B.C:9.8.7.6
                A.C:4.3.2.1                     B.C:4.3.2.1
                A.C:9.8.7.6


Defining System Addresses

In the example above, note that the System Address is used only in the
final hop.  It need not be subnetted, nor have any particular structure.
This allows a very dense IP address space, and should extend the life of
the current globally unique IP address.

When fully deployed, it is only necessary that the IP-SA be unique
within a routing area, and together with the complete routing path would
provide unique identification of a host.  However, to promote ease of
migration and support for mobile routing and automatic partition repair,
the IP-SA is required to be unique within the last three levels of the
completely specified routing path.


Defining Routing Areas

Routing Areas will be defined from the hosts upward.  A host which is
attached to a single network is in the same area with others attached to
the same network with the same policy.  A host which is attached at two
points is in both areas (has two routing paths), regardless of whether
it also acts a router between the areas.

Routing Areas are bounded by routers which are connected to more than
one level of area or are separated by policy.

The Area numbers range from 1 to 254.  Area 0 means this level.  Area
255 means all areas at this level.


Defining Routing Paths

Routing Paths will be defined from the root backbone downward.  At each
separation of level, and for each quality of service or security, an
additional number is assigned to an area.

In the above example, suppose that Host 4.3.2.1 was not participating in
a security agreement with all of the other hosts.  Even though Host
4.3.2.1 appears on the same network, it would simply not have an entry
in the security Routing Paths, and would not be visible to Hosts 9.8.7.6
and 1.2.3.4 for that type of traffic.

Automatic discovery of Routing Paths is possible, but would require
cooperation between the hosts and the DNS for the automatic assignment
to be propagated.  Any host, including routers, would request the path
at each point of attachment.  As the routers discover their position in
the path, they could assign area numbers and respond to the requests
that appear on other points of attachment.  The hosts could then inform
the DNS of the current path(s).

The path may be padded with area zero.  When found at the beginning of
the path, it means root.  When found at the end of the path, it means
this level.  Area zero may never be found in the interior of a path.
All zeroes may be removed from the path without a change in meaning.


Final Hop Delivery

This definition extends Routing Paths only to a particular area.
Delivery to the final destination continues to be accomplished with
current methods.

One advantage of this design is that routing paths may be gradually and
automatically extended as routing implementations are upgraded.  The
Routing Path may be specified at the source or the inter-domain router,
and used until it is no longer recognized.  From that point on, the
routing takes place using the current subnet methods.

The separation from local delivery mechanisms allows the design to adapt
without change to all current and future link media, and to benefit from
improvements in such things as automatic host configuration and router
discovery without change to the overall routing scheme.

Another advantage is that mobile systems may continue to receive traffic
without changing the Routing Path, by specifying an incomplete path and
relying on local routing to make the final delivery.  This reduces
propagation of changes and hides such details from the global internet.

This feature also facilitates automatic repair of partitioned areas.
The path can be locally shortened to a level which includes the
partitioned area, and delivery can take place by internal means.


Migration to Path Routing

Initially, the path will simply be the Autonomous System number.  These
are centrally administered, and are already distributed to backbone
routers.  The AS number can easily be reorganized as a two level
heirarchy with no effect on current backbone routing tables.  Backbone
routers will add the Routing Path (AS number) to IP4 datagrams, which
will improve the speed of routing over subsequent backbone routers.

Next, a new record must be added to the Domain Name System to maintain
the default Routing Path for each SOA, and the Routing Path must be
returned in the information part of every DNS access, including inverse
address lookups.

Eventually, all participating domains connecting to the backbone will
be required to add the Routing Path when it is missing from IP4
datagrams.  At that point, the massive backbone routing tables may be
eliminated.  The conversion tables in the border routers are likely to
be considerably smaller than in the backbone routers, since most domains
do not continuously send to all other domains.

In the long term, every IP4 host will be converted to PIPE, and include
the Routing Path in every datagram.  After this is completed, system
addresses will no longer need to be globally unique.


New IP Header

  Note that each tick mark represents one bit position, and bits are
  numbered from most significant to least significant.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | IHV |   IHL   |  Path Level   |          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |    Flags    |  Time to Live   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Protocol           |         Header Checksum       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         Source Address                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Destination Address                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  IHV:  3 bits

    Internet Header Version indicates the format of the internet header.
    This document describes version 011 binary.

        The IP4 Version is 0100 binary, but is encoded in a 4 bit field.
        Therefore, this choice is equivalent to IP4 Version 0110.

  IHL:  5 bits

    Internet Header Length is the length of the internet header in 32
    bit words, and thus points to the beginning of the data.  Note that
    the minimum value for a correct header is 5.  The maximum header
    length is 124 octets, and a typical header is 28 octets.

        The IP4 IHL field occupied this location.  The field was
        expanded in order to provide more option space.  The minimum
        header size is always known by the Version field.

  Path Level:  8 bits

    The Path Level indicates the area level to which the sender is
    routing.  A zero indicates the root backbone, a one indicates the
    first area in the path, etc.  This field may be used as an index
    into the Path Option.

        The IP4 Type of Service field was removed, since TOS was rarely
        used, and is now encoded in the Routing Path.

  Total Length:  16 bits

    Total Length is the length of the datagram, measured in octets,
    including internet header and data.  This field allows the length of
    a datagram to be up to 65,535 octets.  Such long datagrams are
    impractical for most hosts and networks.  All hosts must be prepared
    to accept datagrams of up to 1480 octets (whether they arrive whole
    or in fragments).  It is recommended that hosts only send datagrams
    larger than 1480 octets if they have assurance that the destination
    is prepared to accept the larger datagrams.

    The number 1480 is selected to allow a reasonable sized data block
    to be transmitted in addition to the required header information,
    and to allow typical lower-layer encapsulations room for thier
    respective headers.

        IP4 has maximum 65,535, minimum 576 octet datagrams.  Over time,
        memory limitations have eased considerably, and there have been
        some indications that a larger minimum datagram size throughout
        the internet would be beneficial.

        Raising the maximum would primarily benefit those users whose
        entire path consists of high bandwidth, high delay transmission.
        This design trades off an increase in size of the Protocol
        field, instead.

  Identification:  16 bits

    An identifying value assigned by the sender to aid in assembling the
    fragments of a datagram.

  Flags:  8 bits

    Various Control Flags.

      CX: 0 = Not Congested,    1 = Congestion Experienced.
      DF: 0 = May Fragment,     1 = Don't Fragment.
      MF: 0 = Last Fragment,    1 = More Fragments.
      IT: 0 = default,          1 = Interactive Traffic.

                          2
          6   7   8   9   0   1   2    3
        +---+---+---+---+---+---+---+---+
        | C | D | M | I |   |   |   |   |
        | X | F | F | T | 0 | 0 | 0 | 0 |
        +---+---+---+---+---+---+---+---+

        The IP4 Flags field was only 3 bits.
        The DF and MF flags are located in the same position.

        The CX bit is set by any router where congestion is experienced.
        The IT bit is set by the host or router to indicate interactive
        traffic within a particular quality of service or security.


  Time to Live:  8 bits

    This field indicates the maximum time the datagram is allowed to
    remain in the internet system.  If this field contains the value
    zero, then the datagram must be destroyed.  This field is modified
    in internet header processing.  The time is measured in units of
    seconds, but since every module that processes a datagram must
    decrease the TTL by at least one even if it process the datagram in
    less than a second, the TTL must be thought of only as an upper
    bound on the time a datagram may exist.  The intention is to cause
    undeliverable datagrams to be discarded, and to bound the maximum
    datagram lifetime.

        The IP4 Fragment Offset field was removed, and replaced with an
        IP Option.  The TTL field was moved to a new location, where it
        is easier to increment on most hardware.

  Protocol:  16 bits

    This field indicates the next level protocol used in the data
    portion of the internet datagram.  The values for various protocols
    are specified in "Assigned Numbers" [9].

        The size of the protocol field was increased, since there is
        concern that 255 numbers may be too few over the extended life
        of the IP protocol.

  Header Checksum:  16 bits

    A checksum on the header only.  Since some header fields change
    (e.g., time to live), this is recomputed and verified at each point
    that the internet header is processed.

    The checksum algorithm is:

      The checksum field is the 16 bit one's complement of the one's
      complement sum of all 16 bit words in the header.  For purposes of
      computing the checksum, the value of the checksum field is zero.

    This is a simple to compute checksum and experimental evidence
    indicates it is adequate, but it is provisional and may be replaced
    by a CRC procedure, depending on further experience.

  Source Address:  32 bits

    The source System Address.

  Destination Address:  32 bits

    The destination System Address.


New Option Definitions

      Path

        +--------+--------+--------//--------+
        |  type  | length | path             |
        +--------+--------+--------//--------+

        Type = ??

        The Length is the number of octets in the option counting the
        type, length, and path.  The length will always be a multiple
        of 4.

        The Path is a list of areas, as described above.  The path may
        be padded with trailing zeroes to align on 32 bit boundaries.

        Must be copied on fragmentation.  This option usually appears
        twice in a datagram.  The first (destination) Path must always
        appear immediately after the fixed portion of the header.  The
        second (source) Path must always appear after the first.

        The Path option is compatible with IP4.  However, it MUST NOT be
        added to datagrams already containing the Loose Source Route or
        Strict Source Route options.


      Fragment

        +--------+--------+--------+--------+
        |  type  | length |000    Offset    |
        +--------+--------+--------+--------+

        Type = ??

        Length = 4

        000 = currently zero

        Offset

            This field indicates where in the datagram this fragment
            belongs.  The fragment offset is measured in units of 8
            octets (64 bits).  The first fragment has offset zero.

        Must be copied on fragmentation.  Appears at most once in a
        datagram.

        This option provides for fragmentation and reassembly of PIPE
        datagrams in a similar fashion to IP4.  Fragmentation handling
        requires expanding and contracting the header; adding an
        additional option at that time should not be difficult.

        Since many believe that fragmentation is evil, this option may
        not be necessary.

Bill.Simpson@um.cc.umich.edu