[6bone] [NOTIFY] XS26 service/peering outage

John Fraizer tvo@EnterZone.Net
Tue, 17 Dec 2002 14:04:41 -0500 (EST)


On Tue, 17 Dec 2002, Jan Oravec wrote:

> On Mon, Dec 16, 2002 at 09:26:00PM +0100, Stephane Bortzmeyer wrote:
> > On Monday 16 December 2002, at 18 h 3, 
> > Petr Baudis <pasky@xs26.net> wrote:
> > 
> > > and bgpd. Apart random crashes in various time periods (from few minutes to
> > > weeks) 
> > 
> > A funny things about distributed systems is the difference in testimonies :-) 
> > We never had a Zebra crash.
> 
> You were probably never running zebra on router with 2048 interfaces, having
> 2k static routes redistributed into BGP, 10k internal BGP routes, about 200
> prefixes in IGP and about 300 external BGP routes.

Find me, outside of 6bone, *ANY* quasi-production router, I'm talking
about on the entire planet, that has 2048 interfaces.

This sounds like more a problem of you should be splitting that interface
load between many routers than one of there being a problem with Zebra.

> The result: CPU time of ospf6d reached sometimes ~100%, zebra was unable to
> save config files, zebra sometimes freezed for 5 minutes or so making ospf6d
> and bgpd also freeze, sometimes something crashed and so on.

I don't doubt it.  Have you tried to do the same on a 7513?  I'll bet a
dollar to a doughnut that it will croak on that interface count as well
and that the SPF calculation will take forever, delaying convergence, and
that it will burn proc like it was going out of style.

> 
> Zebra is not ready for production networks.
> 

I beg to differ.  Your "network" from what ou've described, is
under-engineered.  What was the purpose again of terminating 2000+
endpoints on a single router again?  You can't seriously think that any
true production (BTW: most of us consider production to be equal to
billable) network architect would put that many eggs in one basket can
you?


> > > problems, especially on FreeBSD we had visibly more problems than on ie. Linux.
> > 
> > I do not have personal experience with Zebra on FreeBSD but, on the Zebra 
> > mailing list, you can clearly see there are far more FreeBSD users than Linux 
> > ones so I doubt that Zebra is much worse on FreeBSD.
> 
> e.g. this one FreeBSD-only bug: you create interface in the system and in
> order to zebra know about it, you have to restart zebra completely...
> imagine doing 100 such changes a day... your BGP peers won't like you :)...
> fortunatelly we have found an ugly way how to solve this...
> 

Jan, forgive me if I'm wrong but, I don't recall seeing you post about
this problem on the Zebra mailing list.  If you did, can you reference the
archive?  


> 
> > > Basically, zebra looks not to be prepared for the networks which change very
> > > dynamically (our iBGP table changes very frequently as user prefixes appear and
> > > disappear; it's also relatively big (in the 6bone world, at least ;) 
> > 
> > We use Zebra for default-free routers on the IPv4 Internet. The 6 bone is a 
> > very small experiment when you compare it to the always-changing 100k routes 
> > of the IPv4 Internet.
> 
> We have 10k always-changing routes in the IPv6. BGP implementation is
> relatively good if you don't dynamically add/remove interfaces.
> 

Again, that sounds like an implementation issue in your network.

Let us assume you are routing out of 5 cities (example)

Router-1
3ffe:80e0:0000::/36

Router-2
3ffe:80e0:1000::/36

Router-3
3ffe:80e0:2000::/36

Router-4
3ffe:80e0:3000::/36

Router-5
3ffe:80e0:4000::/36


All of these routers can announce your aggregate 3ffe:80e0::/28 to their
eBGP peers while announcing the specific "pool" /36 they assing tunnel
space from into iBGP/IGP.

An IGP route to 3ffe:80e0:0000::/36 will still attract traffic destined to
3ffe:80e0:0fff::/48 and there is no need for the more specific ::/48 route
in the IGP or via iBGP announcement to the other routers in the AS.

If you are not assigning each router a "pool" from which you assign tunnel
space, NLA assignments, etc from, you are making your network topology
much more complicated than it needs to be.

The interface problem is another story but again, as you stated, it is an
issue with freebds and NOT one with Zebra. I the OS doesn't properly
report interface changes, you can't expect Zebra to keep up.


> We are not saying zebra is bad, we just say it is not usable in our
> environment.

Actually, you did say that Zebra was bad.  You didn't use those words but,
it was abundantly clear that you were implying that it was bad.

I would like to stress that I don't know of any routing suite that is
going to be happy in the environment I'm picturing based on your
description of your network topology.  Perhaps you might look into that a
bit.


---
John Fraizer              | High-Security Datacenter Services |
President                 | Dedicated circuits 64k - 155M OC3 |
EnterZone, Inc            | Virtual, Dedicated, Colocation    |
http://www.enterzone.net/ | Network Consulting Services       |