WireGuard® teaches simplicity and efficiency
April 15, 2021
Much technological innovation is driven by the search for simplicity. Simpler technologies are often better. This is especially true for computers, because a simpler path to the same goal saves both computational and conceptual resources. That’s why WireGuard® is so fast. Its simpler construction, which makes it safer and faster, also made it the perfect foundation for our NordLynx protocol.
The first way in which WireGuard® exhibits simplicity is that it lacks protocol and cryptographic primitives agility. Unlike other protocols, it offers a narrow range of protocol compatibilities and cryptographic functions. Cipher agility drastically increases complexity because of maintenance obligations. A flaw in one of the underlying primitives will require updates at all endpoints to ensure security, and the more primitives you use, the more updates you may eventually need.
Instead of supporting an unnecessarily large number of ciphers and primitives, WireGuard® works with the selected few that ensure cryptographic safety and efficiency. It uses ChaCha20 for symmetric encryption, authenticated with Poly1305, using RFC7539's AEAD construction; Curve25519 for ECDH; BLAKE2s for hashing and keyed hashing, described in RFC7693; SipHash24 for hashtable keys; and HKDF for key derivation, as described in RFC5869. In comparison, OpenSSL – the next most streamlined protocol – works with 36 algorithmic primitives.
As a result, WireGuard® has fewer lines of code (LoC) by several orders of magnitude compared to its competitors. This improves the ease of working with the protocol as well as its computational economy. In other words, it is a simpler technology that achieves the same goals in a faster and an equally secure way.
Combination of primitives for security
Instead of starting with a large number of cryptographic primitives, WireGuard® employs the Noise framework to combine its selected few and achieve the desired security properties. Noise is a framework for crypto protocols based on Diffie-Hellman (DH) key agreement in which two parties exchange handshake messages and then derive a shared secret key after a sequence of DH operations. This key is then used to send encrypted transport messages.
The sequence and properties of the handshake messages are arranged into predefined patterns. There are 12 fundamental patterns distinguished by two characters which indicate the status of the initiator and responder's static keys. WireGuard® uses the Noise_IK pattern in which “I” refers to the static key for initiator immediately transmitted to responder, despite reduced or absent identity hiding and “K” refers to the static key for responder known to the initiator.
Here's what you're looking at in the following Noise infographic:
CE: The Client's Ephemeral public key, which the client generates for this session.
CES: The Client-side Ephemeral-Static shared secret. This is a Diffie-Hellman shared secret that the client derives using the client's ephemeral private key and the recipient's static public key.
S: A Static public key. The client sends their static public key in the first message.
CSS: The Client-side Static-Static shared secret. This is a Diffie-Hellman shared secret that the client derives using the client's static private key and the recipient's static public key.
RE: The Recipient's Ephemeral public key, which the recipient generates for this session.
REE: The Recipient-side Ephemeral-Ephemeral shared secret. This is a Diffie-Hellman shared secret that the recipient derives using the recipient's ephemeral private key and the client's ephemeral public key.
RES: The Recipient-side Ephemeral-Static shared secret. This is a Diffie-Hellman shared secret that the recipient derives using the recipient's ephemeral private key and the client's static public key.
By using the Noise_IK pattern, WireGuard® reduces the protocol round-trip delay time to one, meaning that that only one message from initiator (1) and one response message (2) are needed to complete the handshake. All following messages (3, 4, etc…) benefit from the sender and receiver authentication and are secure from key impersonation.
There is a theoretical risk involved in the authentication of the first packet, which could be vulnerable to a replay attack. If an adversary intercepts that message, it could be replayed at a later date to impersonate the sender. However, WireGuard® uses a timestamping primitive (TAI64N) to prevent this. Only the most recent timestamp received per client is registered. All other packets are discarded. Even if the server restarts, a replay attack is ineffective since reconnecting clients will use newer timestamps that invalidate any previous ones.
With the timestamping primitive, Noise_IK pattern has no unauthenticated messages that are exchanged between the sender and receiver. Thus, there is no response to packets that have not yet been authenticated. the server remains silent and invisible and the protocol remains secure.
Computing bottlenecks and DoS mitigation
One of the greatest bottlenecks for cryptographic security is the fact that most cryptographic functions are CPU intensive. This too imposes the principle of economy to the field since excessive CPU load may leave the system vulnerable to exploits in which an adversary can overload it. This is why WireGuard® has a fallback option of using cookie reply packets instead of actually processing handshake messages.
Traditionally, cookie replies can be vulnerable to attacks in which the server is tricked into responding to unauthenticated requests. However, this issue can be addressed by requiring all messages to have a MAC address that uses the responder’s public key. This ensures that the initiator is always aware of the destination of the message. In other words, the initiator does not have to scan for the recipient but knows that the responder with the associated public key exists.
In addition, to prevent the interception of MAC addresses, they are required to be encrypted in AEAD with an extended randomized nonce instead of plain text. An additional data field of AEAD is used to bind cookie replies to initiation messages. This shields the initiator from DoS attacks attempting to use fraudulent cookie messages to force the initiator to compute incorrect MACs.
In this way, cookies are used as a key to create the MACs. When the responder is overloaded, only messages with these additional MACs are accepted. Cookie messages constructed in this way end up smaller than either the handshake initiation or the handshake response messages, which mitigates amplification attacks.
Even when this fall back mechanism comes into play, the Noise_IK pattern remains valid with two additional messages being transmitted:
There are two additional elements in the following cookie reply infographic:
TS1 + MAC: The first/earlier timestamp and the client's MAC.
TS2 + MAC: The second/later timestamp and the client's MAC.
Although messages 1 and 3 have an identical handshake payload, each of them is formed with a timestamp and a MAC, so message 1 cannot be replayed as message 3. This both prevents the possibility of replay attacks and mitigates DoS damage.
Dealing with the threat of quantum supremacy
Quantum supremacy is peeking its head over the horizon.When it becomes a more viable and accessible option to more users, it will pose a real threat to cybersecurity. There is a realistic scenario in which Shor’s algorithm, which empowers quantum computers to destroy public key cryptography (RSA, DSA, EDSA etc.), triggers a post-quantum security apocalypse.
This makes it a good idea to look into options that do not rely on public keys. Some of those options will certainly rely on symmetric-key (or secret key) algorithms, such as the Diffie-Hellman key agreement used by WireGuard®. It is immune to Shor's algorithm and it will prevail in the post-quantum scenario.
It has been suggested that symmetric-key options might be vulnerable to another threat – Grover’s algorithm. However, the algorithm won’t allow quantum computers to solve NP-Complete problems in polynomial time, which means that the time it would take for a quantum computer to brute-force its way through a symmetric-key protocol would still be too long to be effective.
While other quantum algorithms (like Shor’s algorithm) provide exponential speedup over the classical solutions, Grover's algorithm provides only a quadratic speedup. This means that a 128-bit symmetric key could be brute forced in 264 (instead of 2128) iterations which is not enough to be computed in polynomial time.