[6bone] [NOTIFY] XS26 service/peering outage

Petr Baudis pasky@xs26.net
Mon, 16 Dec 2002 18:03:47 +0100


Dear diary, on Mon, Dec 16, 2002 at 05:06:53PM CET, I got a letter,
where Pim van Pelt <pim@ipng.nl> told me, that...
> Petr,

Hello Pim,

> Which 'major problems' do you have with bgpd/ospf6d ? I have been
> running this succesfully since the day I enabled it. I've seen succesful
> adjacencies being built between Zebra boxes, Zebra/Cisco, and
> Zebra/Juniper machines.

we were encountering numerous of frequently rather subtle bugs in both ospf6d
and bgpd. Apart random crashes in various time periods (from few minutes to
weeks) and topology mess-ups (OSPFv3 tree broken into several parts not
connecting together, BGP connection sometimes not working over the OSPFv3
connections), we experienced problems with syncing of kernel and zebra's
routing tables, strange deadlocks caused by reading/writing to blocking fds,
90% of the routes suddenly being routed to eth0 (while all the peerings were
over tunnels) and number of other problems. Also, there are some portability
problems, especially on FreeBSD we had visibly more problems than on ie. Linux.

Basically, zebra looks not to be prepared for the networks which change very
dynamically (our iBGP table changes very frequently as user prefixes appear and
disappear; it's also relatively big (in the 6bone world, at least ;) because of
the user prefixes present there).

I must admit that I don't have enough overview to provide further technical
details about the problems, please ask Jan about them (he maintains our version
of zebra (available at http://www.xs26.net/zebra/)).

> You plan to run proprietary software due to lack of support ? May we
> also know which support you are referring to and why the current set of
> routing protocols is not good enough ? I'd be interrested in hearing
> your motivations.

The protocol will mainly reflect the environment, that is tunneled connections
and very dynamic routing tables. The tunneled connections mean that as the IPv4
network routing changes, the latency changes, thus the protocol supports
dynamic computing of the connection latency. This is essential for us and if we
would implement this with OSPF, it would cause routing storms and OSPF routing
table would never converge. Contrary to OSPF, our routers will keep full state
about the connections, thus they will be able to maintain the routing structure
much better and cope with the frequent changes.

Also, reduction from OSPF+iBGP to just one protocol will reduce the number of
points of failure, simplify the routing infrastructure and maintenance of the
network.

BTW, we didn't decide on the license yet. Also, the software switch will
greatly simplify our current distributed operation and will allow us to
implement much more simply features like user BGP peering and dial-up tunnels.
Oh, I mentioned that already...

> |    Even after starting up our BGP implementation we will announce only our
> | prefix (we won't provide transit) for some time, while we will be testing it
> | carefully. We expect to proceed with re-establishment of the BGP peerings
> | slowly and carefully, as we don't want to harm connectivity of other sites or
> | pollute the global BGP table with bogus entries. Also, we will take this
> | opportunity and give full-transit only to those who will actually want it
> | explicitly.
> Good luck with your BGP implementation! I'll surely notify you if I see
> anything strange from the AS's I maintain.

Thanks!

Kind regards,

-- 
 
				Petr "Pasky" Baudis
				(and Jan "WSX" Oravec ;-)
.
> I don't know why people still want ACL's. There were noises about them for
> samba, but I'v enot heard anything since. Are vendors using this?
Because People Are Stupid(tm).  Because it's cheaper to put "ACL support: yes"
in the feature list under "Security" than to make sure than userland can cope
with anything more complex than  "Me Og.  Og see directory.  Directory Og's.
Nobody change it".  C.f. snake oil, P.T.Barnum and esp. LSM users
        -- Al Viro
.
Crap: http://pasky.ji.cz/