APPs force the Network Design, NOT the opposite

I came across a post of Ivan Pepelnjak about a madness of streched firewalls across DCI:

Ivan calls such an idea a stupidity and states:

“For those who still don’t get it: if you lose the communication between cluster members (which would happen after DCI link failure), the firewalls in one data center shut down and cut that data center off the net.”

I don’t get it. Assuming that your CCL is protected by at least two links going through two edge devices you can have the same probability of losing your data center as just for the North-South connectivity. This statement is simply true for DCI as well as for links to an external ISP or WAN. Is it an issue to have your DC down? This a reason to have at least two Data Centers. So not really. Unless DCI links are unstable or an application’s orchestration cannot manage it. The former can be related also to WAN links. Actually DCI can be more stable as it is possible they are in a form of a dark fibre or own DWDM systems. The latter is related to the application architecture which should be stateless or working in an active/active mode to cope with DC shut down.

Ivan promotes ‘a proper application architecture’ or ‘application reengineering’. I fully agree. If an application did not rely on L2 segment, clusters could be streched across L3 or it was stateless then the whole concept of L2 DCI or L2 DCI over L3 links would be avoided. It is not that easy. Bigger companies like Google, Facebook, some ISPs, news portals can work out the right application architecture from the scratch. But the rest companies still relies on applications which are stateful and require L2 to strech clusters.




Streched firewalls are necessary solutions for legacy application architectures. Right, if someone does not need session synchronization on FW or IPS then the active/standby mode is much more advisable. But please, stop pushing the network architects, engineers or vendors that they are trying to find a quick fix for the applications’ requirements. This is a root cause of the network design – Application architectures drive the network design. Not the opposite.



Ten post dostępny jest także w języku: enEnglish

2 replies on “APPs force the Network Design, NOT the opposite”

  1. The point is, that Ivan claims it is not possible to build resilient and redundant L2 DCI and every existing technology trying to do that have some flaws which can lead to catastrophic failure of both connected datacenters.
    In that case every stretched cluster (including firewall cluster) is doomed and will fail bringing down entire network (in worst case scenario).
    And no, poor application design is not a good excuse.

    1. Network design starts from the application side not from the opposite. As still most of applications are stateful this means that network services have to be stateful as well. Unless the active/standby model is good enough L2 has to be streched for at least the North-South firewall. Saying that L2 DCI cannot provide good resiliency or a streched cluster is doomed is simply a misjudgment between the requirements and the risk. Don’t get me wrong. I completely agree that a L3 is a sweet spot for DCI. But still not enough in many cases. Let’s take an example of front web financial services in the active/active model. HTTPS sessions require stateful synchronization between firewall members in both Data Centers. Otherwise your session can be broken in the case of a global prefix change or a network failure. Amazon or Google don’t need it as in their case the TCP session can be reestablished without loosing vital information.

      The only catastrophic failure which is involved with L2 is a network loop which causes a network meltdown. The streched firewall solution does not increase the risk of the loop occurrence. Ivan statement “if you lose the communication between cluster members (which would happen after DCI link failure), the firewalls in one data center shut down and cut that data center off the net.” is a mixture of untruth and miscalculation of a risk. And yet again, I also prefer avoiding problems by design. I would prefer to use L3 or at least the REP/G.8032 protocol to avoid permanent loops. Still a proper design of xSTP can be necessary and also loop free.

      Couple of years ago I discussed with Ivan an another reason against L2 DCI which is the excessive broadcast traffic which chokes the application traffic. For him it is a real threat. For me not really. But even if we assume that this is a real threat we can use today an additional level of segmentation which is STT/VXLAN/NVGRE or micro-segmentation in VMware NSX. This lowers the volume of flooding and blocks potential DoS attack within Data Center.

Comments are closed.