Why Wireless Meshworks don't work at scale

Excerpted from Sha.ddih:

” After a couple years I developed a pretty good understanding that wireless mesh networks aren’t actually a good way to build a real network. These are a few of those reasons.

* Reason 1: Management is hard and expensive. The biggest cost for the networks I ran was actually maintaining them once they were built. It’s not just replacing hardware either: you haven’t lived until you’ve hunted down transient connectivity problems resulting from RF weirdness in urban areas. It’s hard, and even if you’re relying on volunteers to do the work to keep costs down, you’re going to spend all your time just maintaining basic connectivity. Then there are network-level issues, like traffic shaping/throttling: the wireless channel is hella bandwidth constrained, so you must do extensive shaping to ensure everyone gets fair access to your limited resources.

There is a reason the only large scale mesh networks (i.e., Freifunk, Athens Community Wireless, etc) are run by a relatively tight-knit group of smart, motivated people — it’s a significant undertaking even without doing things in a decentralized fashion. My colleague used to run the largest mesh network in the world (not joking!), and eventually his group switched to a carefully-planned, point-to-point wireless network due to mesh’s management overhead, from both an RF and network perspective. When you’re building large systems you want as little unpredictability as possible, and unfortunately unplanned mesh networks just don’t deliver there.

* Reason 2: Omni-directional antennas suck. The whole idea behind a mesh network is that each node in it can see multiple other nodes, so if one goes down, or if there is interference, the mesh routing protocol can find a new path through the network. In order to achieve this, you use omnidirectional antennas. Antennas are passive devices: they just focus RF energy. Omnidirectional antennas are very inefficient, since they throw your energy (i.e., signal) all about, when in reality you just want your signal to reach the handful of nodes nearby. This means your signal travels a shorter distance, and thus you need a higher density of nodes. In my experience, for a small apartment building this is at least one per floor to achieve a semblance of reliability. Even if all 15,000 people on the Darknet subreddit could install and maintain 10 devices, they wouldn’t cover all of Wichita, KS, not to mention the miles of farmland between it and the next town. And, to make matters worse, omnidirectional antennas also receive interference from every direction, making the mesh network less reliable.

* Reason 3: Your RF tricks won’t help you here. You can get higher gain or directional antennas, but again this won’t help. Remember, antennas are passive, only focusing energy. Thus, a higher gain omnidirectional antenna has a radiation pattern more like a disc than a sphere, and the higher gain you go the thinner the disc gets: if your nodes are at different heights, they wouldn’t be able to “see” each other! Directional antennas allow you to focus your RF beam directly where you want it to go, but now your node can’t communicate with as many other nodes, eliminating a key property of the mesh network.

Amplifiers do nothing here, by the way. They only boost transmit power; the real limitation is receive sensitivity. Also, amplifiers are power-hungry and expensive (and there are legal limits to their power levels). Antennas are nice because by focusing both transmitted and received RF energy, they help with both (and they use no power and are relatively cheap to build).

* Reason 4: Single-radio equipment doesn’t work; multi-radio equipment is very expensive. This is the biggest technical reason mesh networks don’t work for Internet access. If you’re using low-cost equipment, it will only have one radio transceiver. This means your node is half-duplex, meaning it can’t both send and receive at the same time. In addition, only one node in a given area can be transmitting at a time: if two nodes send at the same time, their signals “collide” and the receiver won’t be able to decode the message. This is even true if the senders are sending to different receivers: remember, omnidirectional antennas transmit in all directions, and pick up interference from all directions! And to make matters worse, every node is both a sender and a receiver, since every sent packet needs an acknowledgement. There are some tricks to mitigate these problems, but these problems are fundamental (see “hidden node problem”), especially when you have the density of nodes necessary to create a mesh network. Each of these challenges means that each node can only transmit for a small amount of time, and this reduces your effective bandwidth. In practice, a mesh network using single-radio equipment is unusable if a packet must travel more than three hops to its destination.

One solution would be to use multi-radio nodes. You would need two: one to transmit, one to receive. This solves the half-duplex problem, but you still have the interference issue, and if you use multiple channels to get around that problem you quickly will run out of RF spectrum, not to mention having the new problem of how to intelligently allocate spectrum to each node. This spectrum allocation task is an NP-hard scheduling problem, as is allocating non-interfering time-slots for single-radio equipment. There are also challenging practical considerations like how you efficiently implement a valid schedule once you compute it. And, because you still would need roughly the same node density as before, a network of multi-radio devices quickly becomes very expensive.

* Reason 5: Unplanned mesh networks break routing. Once you have a mesh network, you have to figure out how to get packets across it. There are many protocols for mesh routing, like AODV, OLSR, and BATMAN. Fundamentally they require individual nodes to communicate with each other, which not only takes up further network resources, but also means that achieving a consistent routing state (i.e., one in which packets won’t get routed into black holes or loops) is extremely difficult for all the reasons distributed systems are hard to build. The unplanned nature of a grassroots mesh network exacerbates this problem, since poor RF-level connectivity means the connectivity state between nodes changes frequently, leading to more routing overhead in the network. It’s a bad cycle.

I’m not saying mesh networks don’t work ever; the people in the wireless mesh community I’ve met are all great people doing fantastic work. What I am saying is that unplanned wireless mesh networks never work at scale.”