multihomed routing domain issues

Francis Dupont Francis.Dupont@inria.fr
Mon, 26 Jan 1998 20:39:01 +0100


Here is a preliminary version of a draft about
multihomed routing domain issues (as discussed
at the last IETF meeting during 6bone BOFs' lunch-time)...

Francis.Dupont@inria.fr

Internet Engineering Task Force                         Francis Dupont
INTERNET DRAFT                                               GIE DYADE
                                                      January 25, 1998
<draft-dupont-multi-00.txt>

        Multihomed routing domain issues for IPv6 aggregatable scheme

Status of this Memo
   
   This document is an Internet Draft. Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its Areas,
   and its Working Groups. Note that other groups may also distribute
   working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time. It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a
   "working draft" or "work in progress."

   To learn the current status of any Internet-Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet Drafts
   Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net
   (Europe), ftp.isi.edu (US  West  Coast), or munnari.oz.au (Pacific
   Rim).

   Distribution of this memo is unlimited.

Abstract

   This document exposes some issues for multihomed routing domains using
   the aggregatable addressing and routing scheme. A routing domain is
   multihomed when it uses two or more providers of the upper level. Most
   of these issues are not specific to IPv6 but are consequences of the
   addressing and routing scheme.

1. Introduction

   The aggregatable addressing and routing scheme [AGGR] defines an IPv6
   aggregatable global unicast address format for use in the Internet and
   the associated routing.

   The address assignment and allocation mechanism is fully hierarchical,
   a prefix of a given level (ie. of a given length) denotes all the
   destinations in the prefix ie. aggregates them. The customers of an
   Internet service provider are in its prefix (as a consequence a
   multihomed routing domain has several prefixes).

   The routing is standard datagram routing, hop by hop, on destination
   address only (as in IPv4). But it is a prefix routing, ie. forwarding
   decisions are based on a "longest prefix match" algorithm on arbitrary
   bit boundaries without any knowledge of the internal structure of
   addresses.



   When there are two routes for the same prefix with the same length then
   the best is caught for the inter-domain routing protocol [BGP]:

      o policy rules;

      o shortest path, the path being the list of routing domains to cross;

      o protocol metric.


   The aggregation idea is the bet that in most of the cases a
   single-homed Internet service provider at a given level should know
   (ie. has routes to) only:

      o its upper provider (ie. a shorter prefix, used as a default)
        if it is not a top-level provider;

      o its customers (ie. longer routes in its prefix);

      o some routes to other customers of its upper provider (ie.
        sibling prefixes, at the same level).

   With addresses this gives (with P1:P2/x for the concatenation of prefixes
   P1 and P2 with the length x):

      o T/t for the upper provider;

      o T:P/t+p for the provider itself;

      o T:P1/t+p1, T:P2/t+p2, ..., T:Pn/t+pn for siblings;

      o T:P:C1/t+p+c1, T:P:C2/t+p+c2, ..., T:P:Cn/t+p+cn for customers.

   The routing information for siblings is only needed for top-level
   providers. For an other provider it is only an optimization
   (ie. a backdoor) because any destination, including sibling, not
   in its own prefix, is reachable through the upper provider.

   Usual routing exchanges for P at prefix T:P/t+p are:

      o from the upper provider the route to T/t which can be used as
        a default (ie. <>/0);

      o from a customer the route to T:P:C/t+p+c;

      o from a sibling the route to T:Q/t+q;

      o to anybody the route for T:P/t+p (and nothing else).



   The scheme is with arrows for route (and traffic) exchange:

                        +-----+
    Upper Level         |  T  |
                        +-----+
                         |   ^
                     T/t |   | T:P/t+p
                         V   |
                       +-------+                   +-----+
                       |       |------ T:P/t+p --->|     |
    Siblings           |   P   |                   |  Q  |
                       |       |<---- T:Q/t+q -----|     |
                       +-------+                   +-----+
                        ^ | ^ |
                        | | | |
                        | | | +-------- T:P/t+p ----+
                        | | |                       |
                        | | +---- T:P:Cn/t+p+cn --+ |
                        | |                       | |
          T:P:C1/t+p+c1 | |                       | |
                        | | T:P/t+p               | |
                        | V                       | V
                      +-----+                   +-----+
                      |     |                   |     |
     Customers        | C 1 |                   | C n |
                      |     |                   |     |
                      +-----+                   +-----+

   The aggregation is shown by the fact one announces only the route
   to its own "aggregated" prefix and masks routes to longer prefixes.
   Upper levels should not know the details of lower levels, this
   transparency property should be kept.

   A top-level provider has no upper provider (ie. no default) and must
   exchange routes with all the other top-level providers (ie. full
   routing with its siblings is mandatory). In order to avoid routing
   table explosion, the length of top-level prefixes is bounded
   (therefore the number of top-level providers is bounded too).

2. Multihomed Routing Domains

   A multihomed routing domain has more than one provider then it has
   more than one prefix (usually a prefix per provider).

   There are several reasons to be multihomed:



    o the "two coasts" case where the routing domain is split into
      sub-domains in different locations, each domain using a local
      provider:

                +-----+                +-----+
                |     |                |     |
                | T w |                | T e |
                |     |                |     |
                +-----+                +-----+
                  ^ |                    ^ |
                  | |                    | |
        +---------|-|--------------------|-|--------+
        | S       | V                    | V        |
        |       +-----+                +-----+      |
        |       |     |--------------->|     |      |
        |       | S w |                | S e |      |
        |       |     |<---------------|     |      |
        |       +-----+                +-----+      |
        |                                           |
        +-------------------------------------------+

      But in fact this comes down to two routing domains with a backdoor
      between them. The extra routes can be hidden and there is no
      further matter.

    o reliable service: to be able to use another provider in
      case of a connectivity problem. Of course the purpose
      is to limit trouble to the only case when all the
      providers fail (and NOT when at least one fails!).


                +-----+                +-----+
                |     |                |     |
                | T 1 |                | T 2 |
                |     |                |     |
                +-----+                +-----+
                  ^ |                    ^ |
                  | |                    | |
                  | +--------+  +--------+ |
                  |          |  |          |
                  +--------+ |  | +--------+
                           | |  | |
                           | V  | V
                          +--------+
                          |        |
                          |   S    |
                          |        |
                          +--------+



      A given host of a such routing domain may (and should if
      reliable connectivity is needed) have two different addresses,
      one for each prefix (T1:S1:H in T1:S1/t1+s1 and T2:S2:H in
      T2:S2/t2+s2).

      This document mainly covers this case.

3. The Transparency Issue

   If a domain prefix is announced at an upper level, it has to be
   announced to this whole level.



          ^ A/x                   ^ B/x and A:S/x+y
          |                       |
        +-----+                +-----+
        |     |                |     |
        |  A  |                |  B  |
        |     |                |     |
        +-----+                +-----+
          ^ |                    ^ |
          | |                    | |
          | +--------+  +--------+ |
          |          |  |          |
          +--------+ |  | +--------+
                   | |  | |
                   | V  | V
                  +--------+
                  |        |
                  |   S    |
                  |        |
                  +--------+

   If the provider B tries to announce the prefix A:S/x+y in order to be
   able to route the traffic for S with both prefixes A:S/x+y and B:S/x+y
   then B will catch the whole traffic for S because the prefix A:S/x+y
   is longer than the prefix A/x (x+y > x) so it is a better match...

   In this case the only solution is that both A and B announce routes to
   prefixes A:S/x+y and B:S/x+y which breaks the transparency property and
   obviously does not scale.

4. Mutual Backup

   There is a case where the transparency property is kept, routing
   is as reliable as possible and is optimal in almost all the cases.




          ^ A/x and B/x           ^ B/x and A/x
          |                       |
        +-----+                +-----+
        |     |------ A/x ---->|     |
        |  A  |                |  B  |
        |     |<------ B/x ----|     |
        +-----+                +-----+
          ^ |            B:S/x+y ^ |
          | |            A:S/x+y | |
          | +-- A/x -+  +--------+ |
          |          |  |          |
          +--------+ |  | +- B/x --+
       A:S/x+y     | |  | |
       B:S/x+y     | V  | V
                  +--------+
                  |        |
                  |   S    |
                  |        |
                  +--------+

   For a provider T in an upper level or the same one than providers A
   and B, routes for the prefix A/x are not equivalent because the prefix
   A/x announced by A is direct (one element (A) in the path) and the
   prefix A/x announced by B is indirect (two elements (B and A) in the
   path). Then traffic for A will go to A directly. The same thing
   applies for B.

   The prefix A:S/x+y is longer (ie. better) than the prefix A/x then
   for A the whole traffic for S will go directly, same for B.

   S has routes for A/x and B/x and can use any provider for other
   destinations. The choice of the provider is managed by internal policy
   rules, in order to avoid asymmetrical routing the source address
   selection should be coherent with the policy. Usually S managers
   ask for routes to upper levels up to the first common upper provider.

   If the path through A is not available then the whole traffic for S,
   including the one to or from addresses in the prefix A:S/x+y will go
   through B.

   This case supposes a mutual backup agreement between A and B which
   can be the case if A and B are not in competition, for instance A is a
   mission provider and B a geographical one. But it is a real constraint...

   This still works if announces between A and B do not carry full
   prefixes (but they should include (ie. be shorter than) the prefix
   *:S/x+y). The backup will work only for a part of A and B (with a dark
   hole in case of failure for customers not implied in the backup
   agreement). Unfortunately this does not work in more complex cases:




          ^ A/x and B/x             ^ B/x, A/x and C/x       ^ C/x and B/x
          |                         |                        |
        +-----+                +--------+                +-----+
        |     |--- A:S/x+y --->|        |--- B:R/x+y --->|     |
        |  A  |                |    B   |                |  C  |
        |     |<--- B:S/x+y ---|        |<--- C:R/x+y ---|     |
        +-----+                +--------+                +-----+
          ^ |            B:S/x+y ^ | ^ |            C:R/x+y ^ |
          | |            A:S/x+y | | | |            B:R/x+y | |
          | +-- A/x -+  +--------+ | | +-- B/x -+  +--------+ |
          |          |  |          | |          |  |          |
          +--------+ |  | +- B/x --+ +--------+ |  | +- C/x --+
       A:S/x+y     | |  | |           B:R/x+y | |  | |
       B:S/x+y     | V  | V           C:R/x+y | V  | V
                  +--------+                 +--------+
                  |        |                 |        |
                  |   S    |                 |    R   |
                  |        |                 |        |
                  +--------+                 +--------+

   The backup is not transitive in this case, if something goes wrong
   in the B path for S the traffic can try to cross C which knows nothing
   about S and will drop packets...

5. Broken Bit

   
   Consider the standard multihomed case when a link is broken:

                +-----+                +-----+
                |     |                |     |
                |  A  |                |  B  |
                |     |                |     |
                +-----+                +-----+
                  ^ |             X      ^ |
                  | |              X     | |
                  | +--------+  +---X----+ |
                  |          |  |    X     |
                  +--------+ |  | +---X----+
                           | |  | |    X
                           | V  | V     X
                          +--------+
                          |        |
                          |   S    |
                          |        |
                          +--------+
  


   If we look inside the routing domain S:

                +-----+                   +-----+
                |     |                   |     |
                |  A  |                   |  B  |
                |     |                   |     |
                +-----+                   +-----+
   +---+          ^ |                X      ^ |
   | X |          | |                 X     | |
   +---+          | +--------+     +---X----+ |
                  |          |     |    X     |
                  +--------+ |     | +---X----+
                           | |     | |    X
                           | V     | V     X
                         +----+   +----+
                    +----| RA |---| RB |----+
                    |    +----+   +----+    |
                    |      |         |      |
                    | -------------------   |
                    |           |           |
                    |         +---+         |
                    |         | R |         |
                    |         +---+         |
                    |           |           |
                    |       -------         |
                    |        |              |
                    |      +---+            |
                    |      | H |            |
                    | S    +---+            |
                    |                       |
                    +-----------------------+

   The host H has two addresses, A:S:H and B:S:H, and the path through B
   is broken.

   An external host X will use A:S:H because B:S:H does not work. The DNS
   will return both addresses but the applications should try all of them
   (on BSD 4.4 derived Unixes we have found only one standard application
   trying only the first returned address). We can try to play on address
   order in the DNS but the DNS caching mechanism makes this difficult
   (but it is not necessary). In conclusion new connections from X to H
   will work.

   For new connections from H to X the problem is to force the choice of
   the good source address (A:S:H) by H. The proposal is to add a "broken
   bit" in prefix information in router advertisement in order to inform
   nodes that addresses in a given prefix should not be used. The border
   router RB knows there is a problem and should send this information to
   all the routers of S using for instance the router renumbering
   protocol.

   The last case, existing (ie. established before the failure) connections
   between H (using B:S:H) and X are dealt with in the next section.



6. Use Of Mobility Mechanisms

   The idea is to use some mechanisms of IPv6 mobility [MOB] (home
   address and binding update but not home-agent nor (in fact) true
   mobility) in order to make critical connections resilient to provider
   failures.

                                     +---+
                  aaaaaaaaaaaaaaaaaaa| X |
                  a                  +---+
                  a                    b
                  a                    b
                +-----+                b +-----+
                |     |                b |     |
                |  A  |                bb|  B  |
                |     |                  |     |
                +-----+                  +-----+
                  ^ |               X      ^ |
                  | |                X     | |
                  | +--------+    +---X----+ |
                  |          |    |    X     |
                  +--------+ |    | +---X----+
                           | |    | |    X
                           | V    | V     X
                        +--------------+
                        |   a      b   |
                        |   a      b   |
                        |   a      b   |
                        |   a      b   |
                        |   a    +---+ |
                        |   aaaaa| H | |
                        |        +---+ |
                        | S            |
                        +--------------+

   There is a connection between H and X (using addresses B:S:H and X)
   with a security association for authentication (necessary for mobility
   and not a real constraint for a critical connection because it is easy
   to mess an unauthentic connection, for instance with junk RST TCP
   packets).

   After the (used) path through B fails, the broken bit is set in the
   prefix B:S information in router advertisements then H is informed of
   the problem.

   H uses a home address B:S:H destination option in each packet for
   X in order to use A:S:H as the source address: for each router the
   source is in A's prefix and only X replaces the source address by
   B:S:H before looking up the PCB of the connection.



   H sends a binding update with A:S:H as the care-of address to X in a
   packet with an Authentication Header. X receives and processes it,
   sends a binding acknowledgement and uses a routing header with A:S:H
   as the (first) destination and B:S:H as the final destination.

   Summary:

     o packets from H to X:
        source = A:S:H
        destination = X
        home-address = B:S:H
        binding-update (in first packets, should be acknowledged):
            care-of = A:S:H

     o packets from X to H:
        source = X
        destination = A:S:H
        routing-header: one address = B:S:H

   While X must implement the full mobile correspondent node operation,
   H must implement only the binding management (no movement
   detection, no new care-of address acquisition, no operation with a
   home agent). In fact H does not move, it only changes its address
   choice.

7. Security Considerations

   A better reliability in Internet connectivity can only improve
   security. Critical connection should be authenticated and binding
   updates must be carried in authenticated packets (see [MOB] for the
   discussion). IPSEC is mandatory for compliant IPv6 implementations.

8. ACKNOWLEDGEMENTS

   All these ideas were discussed or found at the 40th IETF meeting at
   Washington during lunch-time 6bone BOFs. The transparency issue was
   well-known (and presented by XXX). The mutual backup scheme was built
   by the author for a regional/organization dual-homing at a G6 meeting.
   The non-transitive issue was presented by Alain Durand. The
   diversion of mobility mechanisms appeared in the discussion between
   the author and Matt Crawford who proposed the broken bit.

9. References

   [AGGR] Hinden, R., O'Dell, M. and Deering, S., "An IPv6
          Aggregatable Global Unicast Address Format", Internet
          Draft, <draft-ietf-ipngwg-unicast-aggr-02.txt>, July 1997.

   [BGP] Rekhter, Y. and Li, T. "A Border Gateway Protocol 4 (BGP-4)",
         RFC 1771, cisco Systems, March 1995.



   [MOB] Johnson, D. B., Perkins, C., "Mobility Support in IPv6",
         Internet Draft, <draft-ietf-mobileip-ipv6-04.txt>, November 1997.

Author's Address

   Francis Dupont
   GIE DYADE
   INRIA Rocquencourt
   Domaine de Voluceau
   B.P. 105
   78153 Le Chesnay CEDEX
   FRANCE

   Fax: +33 1 39 63 55 66
   EMail: Francis.Dupont@inria.fr