Libdrop: File sharing through NordVPN

Lukas Pukenis

February 6, 2024


0

The Libdrop library allows NordVPN users to share files over Meshnet. In this article, we explain how we developed our file transfer system and the role Libdrop plays in it.

hands hover over keyboard

What is Libdrop?

Libdrop is a cross-platform library developed in the Rust programming language. It is compatible with Windows, MacOS, Linux, iOS, and Android. File sharing within the NordVPN environment is facilitated by the Libdrop library, which is available as an open-source resource on GitHub.

The goal of Libdrop implementation is to allow smooth and secure file sharing between users over Meshnet. The library should be easily integrated into the NordVPN application so API users can issue transfer requests, with the rest of the processes being carried out in the library.

Libdrop protocol

The Libdrop protocol enables peer-to-peer file sharing via both IPv4 and IPv6. In this process, the sender presents files to the receiver, who then selects specific files for download. Downloads are then initiated.

The transfer is live until one of the peers goes down or the transfer is explicitly canceled by either of the peers, after which the files are no longer available for download. This provides the user with a time window where they can decide which files they want to download now and in the future while the transfer is still up.

High-level overview of communication between two peers.

High-level overview of communication between two peers.

Communication and low-level details

Let’s take a closer look at the technical details of the communication process, and how we developed our current setup.

gRPC

At first glance, it seemed evident that our easiest course of action would be to focus on the HTTP server and client because this is very easy to use and understand, as well as being a time-proven technology. We could make a REST endpoint and just proceed with a regular HTTP download.

To enhance speed and control, we opted for gRPC. Because gRPC is a binary protocol it has less overhead. It is also strongly typed, making errors harder to introduce. gRPC technology automatically generates the code needed for both the client and the server, making it an excellent fit. In fact, Libdrop was originally built on gRPC.

Initially, it was very comfortable to use — both the client and the server code just worked. We could issue a certain call via the wire and expect the appropriate function to be called on the peer.

However, as time went on, we found that debugging gRPC presented some challenges, and the "black box" nature of it began to concern us. The generated code also had little control over the socket itself because it was abstracted too far away to gain direct access. Consequently, we transitioned from gRPC to WebSockets in pursuit of a more adaptable solution.

WebSockets

Unlike gRPC, WebSockets is not strongly typed, which offers a degree of flexibility. This flexibility comes at the cost of making it easier for bugs to appear. However, there's no automatic code generation, which is a plus.

The ability to easily introduce versioning was another advantage. We just need to have the URL in the form of "ws://{addr}/drop/{version}/query." It also helped that WebSockets is a fairly easy-to-understand technology that works in tandem with HTTP so the traffic can be inspected easily as well as debugged.

Choosing WebSockets turned out to be a wise decision. It led to a reduction in code complexity and greatly enhanced our understanding of the data flow. Plus, having written the code ourselves, we felt fully in control of the system.

Simplified representation of backward compatibility between Libdrop versions.

Simplified representation of backward compatibility between Libdrop versions.

Rust and Tokio

Due to the nature of the Libdrop library’s heavy IO and event-driven architecture, the codebase contains a lot of asynchronous flows which could have been a tough problem. However, Rust’s great implementation of async alongside the Tokio library proved to be a great combination in dealing with this and avoiding potential crashes.

Rust shines because the borrow checker is really persistent about lifetimes and safety while developing because it prevents you from compiling incorrect code that breaks ownership rules.

We are also fairly safe from panics as we spend most of the time in Tokio tasks and those are executed in catch_unwind. This means that if the Tokio task panics it will simply yield an error instead of tearing the whole thread down.

Still, not every place in the codebase runs in a Tokio task, and so for those cases where a Tokio task is not involved, we tune Rust linter to detect unwrap() calls in the codebase that could potentially invoke a panic handler.

NordVPN uses Rust in numerous libraries and panics are handled in custom panic handlers. These handlers wrap the error and emit it via callback so the API user receives it and can properly log it.

API and the dilemma: To block or not to block?

We’ll now explore the choices we made around our API.

SWIG

For the API we used SWIG, which was already battle-tested and proven by libraries such as Libtelio. SWIG automatically generates FFI binding code for all target platforms, but it’s not without limitations. While it’s very easy to pass primitives such as integers and strings, higher-order structures are not that comfortable. In a compromise, we accept certain parameters as JSON strings.

JSON strings, while slightly less optimal, are a great solution to the problem. All mainstream languages know how to parse it or have a popular library ready to do so. The downsides to JSON strings are less type safety and a need for greater control to avoid breaking the conformity.

Event-driven architecture and reporting

One question that arose around the API was whether or not we should block it. Based on the API users we opted to not make the API block and communicate via events. This provides more complexity on the API design side but it provides an event-driven API and means that API users don’t need to care about threads. App developers are usually experienced in working with callbacks so this architecture suits them well.

Callbacks are used for event notification and reporting so the API user can receive reports and log them where appropriate. Events are for reporting. Both events and reports are passed on as JSON-encoded strings.

Errors are reported when the parameters to the API are incorrect or when a runtime error is encountered.

Types of events

Events are emitted for various milestones:

  • A transfer was requested.

  • The transfer was successfully queued (the API returned no error) and contains all the paths collected.

  • A file upload/download was started, finished, or failed.

  • A file upload or download progressed.

User experience and history tracking with SQLite

When considering how to track transfer records and states, our team opted for a local SQLite database that users can easily inspect.

We chose SQLite for its flexibility and cross-platform availability, and because it offers a strong query system that makes it user-friendly.

The widespread use of SQLite in various applications gave us added confidence in its reliability and performance, making it an easier choice over alternatives like JSON files or custom binary formats.

Database limitations: A read-only resource

The SQLite database does not control Libdrop's operations in any way. Its role is purely read-only. The SQLite database serves to offer our users a convenient API for accessing transfer histories and logs, without impacting the underlying functionality of Libdrop.

In cases where we fail to open or migrate the SQLite database successfully we can remove it entirely and try again. If it fails again we can then use an in-memory database that provides proper functionality while the app is alive.

Security and validation

Security in Libdrop has several key focuses:

  • Ensuring that the right sent file reaches the receiver.

  • Ensuring that a transferred file is immediately picked up and scanned by NordVPN’s Threat Protection feature.

  • Ensuring that foreign apps cannot make calls directly to the peer.

  • File validation: Ensuring integrity from start to finish

As part of our commitment to ensuring a reliable file transfer process, we take several precautions. The moment a file is selected for upload, we immediately fetch its metadata, specifically capturing its size and checksum. This information is then shared with the receiver to ensure both parties have synchronized data right from the start.

During the actual upload, we keep a close eye on the data transfer. We compare the size of the transferred data with that of the received data, allowing us to detect any inconsistencies. If a discrepancy is found, the transfer is terminated, ensuring that only accurate and complete files proceed.

At the receiving end, a fresh checksum is calculated once the correct amount of data is received. If this calculated checksum doesn't align with the initially shared checksum, the transfer is terminated. In such cases, the transfer is reported and stored as a failed transfer on both ends.

Threat Protection

In both Windows and MacOS, files often carry metadata indicating their origin. Without this information, antivirus software would need to scan each and every file for threats, which isn't efficient.

Applications regularly produce many files, the majority of which are legitimate and harmless so it’s common practice to embed specific markers within these files. This allows antivirus tools to identify and scan files faster.

On Windows and MacOS, we immediately attach these markers once files are downloaded. This ensures that the Threat Protection scanner can promptly identify and assess them, leaving no gap during which they might be accessed without a prior security check.

MacOS uses kMDItemWhereFroms while Windows uses Zone.Identifier.

Socket security

Finding the protocol and communication method used by Libdrop is straightforward. The port we use is 49111, and the address is in the format ws://{addr}/drop/ (this can all be seen in the source code provided on GitHub).

While it's true that you can bypass Libdrop by directly connecting to this URL with cURL or similar tools, this is a situation we'd like to avoid. Our aim is to maximize usability and minimize the risks for users.

Since we considered user experience, we also explored the idea of automatically accepting files from trusted peers. However, we recognized the potential risk of someone abusing this feature to spam others, and so decided against it.

To enhance security, we implemented an authorization system based on Meshnet keys. These keys are retrievable via API after successful user authentication. Since NordVPN is consistently aware of peer public keys, we're able to use this information to validate connections at the Libdrop communication level. If a user fails the authorization process, the transfer is terminated — no questions asked from the receiver side.

To accomplish this, we employ HMAC with SHA-256 and generate a shared key using the Diffie-Hellman algorithm. When initiating a connection, the NordVPN app provides the public key of the peer. Combined with the private key we already possess from the time of initializing Libdrop, we're able to calculate this shared key. Both sides of the transaction do the same, and the process is only deemed successful if the keys match.

We're aware that this system isn't bulletproof. For instance, users might find a way to exploit a Linux CLI app. However, we believe these improvements represent a significant step towards creating a safer and more reliable experience for our users.

Permissions and user access

Integrated into the NordVPN application, Libdrop operates under the constraints of user permissions as enforced by the operating system. This ensures that users can only share files to which they have ownership rights. To initiate a file transfer, a connection between peers must first be established. Enabling file sharing for a specific Meshnet peer allows one to start receiving files from that device. Disabling file-sharing permissions for a Meshnet peer will halt incoming transfers from that particular device. You can read more about file-sharing permissions here.

On the Linux platform, we faced an additional challenge because the app needed to run as root due to Libtelio's requirements. Running Libdrop as root was out of the question, as it would have unrestricted access to the entire file system. To navigate this, we set Libdrop up to run as a user process that communicates with the NordVPN daemon.

Fortunately, mobile devices didn't present the same issue, thanks to their robust sandboxing. Likewise, applications on Windows and MacOS operate with user permissions, so there were no concerns on those platforms either.

It's worth noting that Libdrop isn't designed for multi-user scenarios, as it uses a hardcoded port number, 49111. However, it can technically bind to different network interfaces without any problems.

File aggregation

To simplify the user experience and streamline integration, we designed Libdrop to automatically enumerate files in the paths provided. These paths can point to either individual files or directories, allowing for greater flexibility. This setup posed several challenges, however:

  • How can we recreate the directory hierarchy?

  • What do we do when we encounter a symlink?

  • What happens if there are too many files?

  • What are the issues with Android permissions?

Let’s take a closer look at how we overcame these challenges.

Recreating the directory hierarchy

For hierarchies, we used the same rename logic as we did with the files, but only for the root level directory. We only communicate the path with the peer starting at the root level of the provided path, meaning that if there’s a directory structure of C:\Files\Photos\Cats\Cute and the user adds C:\File\Photos then we only send Photos\*, the receiver is unaware of the C:\Files portion. This was important because, if the receiver was aware of that portion, personal details could be leaked.

Interestingly, directory separators are not cross-platform. Windows supports both \ and / while Unix-based OSs (Android, iOS, MacOS, Linux) support only /. Initially, we just communicated with the path as-is, which then produced some fun results. Sending a path, “Photos\Cats\1.jpg,” from Windows to a Linux machine would produce a file with that name instead of two directories and one file when transferring a directory.

As an easy solution, we chose the following approach: when the user sends a directory and we aggregate a path, we split it with the native path separator and then glue it back together using the universal one — /. We can then use that path going forward.

Dealing with symlinks

We decided that, when a symlink is encountered, we would return an error. This reduces the chances of possible security issues arising around certain files.

Symlinks reduce the visibility of operations, creating situations in which a user might think they are sending one set of files while in reality a different group of files are picked up.

What happens if there are too many files?

In Libdrop we allow for certain configuration values when initializing the library, ensuring that it can be flexible across multiple platforms. To help with interoperability, we decided to add two values: file limit and file depth limit.

Including these two values means that deep directories result in an error. An error is also generated when the file limit is reached. We think it’s better to be explicit than implicit, and so we’d rather generate an error than send an incomplete file transfer.

Android permission issue

Using the transfer system on Android presented us with some challenges. In order to use the POSIX file system, the API needs appropriate permissions in the application manifest. Direct file system access requires that the application is placed within a single specific category, but this was a problem because NordVPN is not just a file or backup manager.

A solution was found when we did an experiment and found that upon selecting the file in Android, it was possible to detach the file descriptor. This enabled us to use POSIX with the provided descriptor:

Testing and dogfooding

We used Python and Docker to load the compiled library and imitate conversation between two peers. This allowed us to reproduce the bugs by writing test cases, easing our concerns about bigger changes in the codebase.

The testing framework allowed us to generate scenarios quickly using a Python API where we can imitate all the actions a user might take alongside the events we would expect as a result.

Tests can’t perfectly replicate what happens in real life so we still constantly seek QA feedback alongside the relevant aggregated logs. Still, having an easy-to-use test framework proved to be very beneficial and boosted our confidence during development.

Meshnet protocol and wire safety

NordVPN's file-sharing feature is built on Meshnet, a peer-to-peer protocol. This design allows for the shortest possible data path between computers, eliminating the need for third-party cloud storage or service providers.

One caveat is that both Meshnet nodes must be online simultaneously for the transfer to take place. All traffic between Meshnet nodes, including file sharing, is authenticated and encrypted via WireGuard’s cryptography, ensuring that even Nord Security cannot access the contents of the files or the traffic being transmitted. You can read more about the Wireguard protocol here.

Thanks to Libtelio and Meshnet, Libdrop doesn’t need to use any encryption of its own because double-encryption would be unnecessary. If you’re considering implementing Libdrop into your own product, you should integrate transport layer security (TLS), which should be fairly trivial to implement.

In summary, NordVPN's File Sharing feature offers a secure, efficient, user and API-user-friendly method for peer-to-peer file transfers through the Meshnet.