Failure Rate Analysis

From BACnet Wiki
Jump to navigation Jump to search

Why is a 1% failure rate for a BACnet MS/TP network unreasonable.

Let us consider a network that suffers a completely random 1% failure rate. This means 1 out of every 100 packets is going to fail. Let us also assume that poll rate for the site is a sustained aggregate (all devices) 100 polls per second.

If a packet is lost due to a failure, then there are (another assumption) 3 retries. This is common.

So, in order for a device to go "offline" we need 4 failures in a row. Very unlikely, yes? Of course 1/100 x 1/100 x 1/100 x 1/100 = 1/100 000 000.

One in a hundred million......

but, now this device is not alone, a university campus may have 4000 devices (easily), and we have already said the poll rate for the site is 100 polls per second. So for one device to be polled 100 000 000 times this will take 4000 / 100 * 100 000 000 = 4 000 000 000 seconds.

This is 4 billion seconds, but there are 4000 devices, so let us take that into account and declare that every 4 000 000 000/4 000 seconds one device is going to fail on the site.

So, a failure every 1 000 000 seconds is.... 1 000 000 / 60 / 60 / 24 = 11.5 days.

Is this acceptable? For a control system? I would say not.