A brief history of email: from protocols to the @ symbol
April 15, 2021
Table of contents
The history of email stretches across half a century. This fascinating technology has been evolving since the earliest days of the internet, and is still being altered and perfected years later. From its origins as the side project of a military programmer to its ongoing war with spam, there's a lot to cover.
In this article, we’ll explore the development of email and the many protocols that make it work. We'll look at how it's changed throughout the decades, and the role that junk mail and phishing played in the process. And finally, we'll examine the way email functions today in offices and workplaces around the world.
The basics: email and protocols
The “e” in “email” stands for “electronic”, distinguishing this type of mail delivery from the traditional paper-and-mailman method that preceded it. Email is now a core part of how people, businesses, and even governments communicate, but it's easy to take for granted. We rely on applications like Gmail and Outlook, and forget that a complex system is running behind the scenes; the system of email protocols.
A history of email is really the story of the many email protocols that have been created over the last half-century. In this context, a protocol is a set of rules that facilitates the transfer of information between different communication systems. These rules govern how the information is packaged, sent, recieved, and presented.
As the needs of email users changed throughout the decades, new protocols have been developed to allow for safer, more efficiant email transfer. While the SPF, DKIM, and DMARC protocols now reign supreme, many others have been created over the years. For the purposes of this article, we’ll be focusing on the SMTP-based ancestors of the modern email.
Digital mail appeared before computer networks even existed. In the 1960s, “using a computer” meant connecting your terminal to a giant central computer, which could supply several terminals at once. Sometimes, those terminals were located in adjacent buildings or even further away. A dial-up connection linked these distant machines to the main computer.
The first digital mail took the form of text files. Since coworkers weren’t always in the same physical space, they began to leave these files on the central computer. If Emily wanted to get a message to her coworker Luke, she could write it out and save the file as “forluke.txt”. When Luke accessed the central computer through his own terminal, he would be able to open her file.
To allow for more private conversations, local private mail systems were created. In 1965, programmers at MIT created a system that would let a computer’s users send messages amongst themselves. One user could “send” a message to another, and it would be written into the end of the recipient’s mailbox text file. This was a step in the right direction; the email's evolution was about to begin.
The birth of email (and the origin of the @ sign)
Sending messages to other users on the same computer is one thing. But email, as we know it today, is about reaching someone on a different computer from your own. The man who made this possible was Ray Tomlinson.
Tomlinson was a programmer working on ARPNET, the US government's first computer network and a progenitor of the modern internet. In 1971, he wrote a program called SNDMSG, which would allow a piece of digital mail to be sent from one computer to another. In the process, he gave the “@” symbol a whole new meaning, using it to separate the username from the hostname.
Tomlinson created SNDMSG in his own time as a side project, and didn't realize at first how important his invention would be. Yet many of his systems and ideas would later be reusued in the development of the modern email.
We've seen how the first digital mail services were created, and how inter-computer mail was born. Now, as the the 70s progressed, the evolution of the email protocol really got underway.
1971: People began using a new File Transfer Protocol, or FTP (RFC114), to add text to a recipient's mailbox file.
1972: Commands for sending mail (MLFL, MAIL) were incorporated into FTP (RFC385). FTP was not a particularly convenient protocol, as mail could only be sent directly to the recipient's host when it was turned on and the service was running. Still, it was more convenient than manually uploading files to the recipient.
1973: On February 23, a pivotal moment took place. This was a summit that would determine the fate of email (RFC469/RFC475). The ARPANET developers decided to continue using an FTP-like protocol, but to have it managed by a separate service that was able to send (MLTO) and even forward mail to another host, as well as determining who the sender was (FROM). The same summit also conclusively approved the use of the @ symbol (cribbed from Ray Tomlinson's SNDMSG).
Also in 1973, the header was standardized by RFC561. To this day, we still use the header elements FROM, DATE and SUBJECT.
1975: Continuing a busy decade, the mid-70s also brought us CC and BCC (RFC680).
The FTP-based email system gained an increasing number of headers (RFC561, RFC680, RFC724, RFC733), until RFC821 and RFC822 were finally codified as the Simple Mail Transfer Protocol (SMTP) in 1982. With a few minor additions from 2001 (RFC2821, RFC2822) and 2008 (RFC5321, RFC5322), it’s still in use today.
The 1982 SMTP protocol differed from previous iterations in the way that addresses were actually written: “user@host” became “user@domain”. Though the DNS service did not exist at the time, the principle of tree-type domain structures was already in place, and is used on the internet to this day.
In 1983 RFC882 codified DNS with the Mail Forwarder (MF) and Mail Destination (MD) record types. These were then replaced by a single MX record type in 1987 (RFC973). Up to that point, everyone used hosts.txt files, which listed all computers connected to the internet and had to be maintained manually or downloaded weekly from the administrator's computer on the ARPANET.
Email had now grown from being a box on a specific computer or server to a virtual communications network that could run across multiple servers under a single domain. However, it wasn't all good news. The system had changed rapidly since its invention, and as more people and organizations began to use it, a new problem emerged.
It's time to talk about spam.
A brief history of spam
Spam can be defined as any unsolicited mass message. The practice of spamming — sending a high volume of unwanted mail to multiple people at once — has evolved alongside email through the years. Creating protocols to combat this nuisance is an on-going effort.
The first spam email was a massive ad campaign, delivered over ARPNET, for the new DECSYSTEM-20 computer. Of course, no one called it spam back then. That name would later be derived from a sketch by British comedians Monty Python. In the early days of the internet, people referenced the Python sketch by flooding online chatrooms with the word “spam”, annoying other users.
The practice and its new name spread to Multi User Dungeons, the first text-based MMORPGs. Then it jumped to Usenet, an early forum system, in the late 80s. Here, the terminology and the practice of spamming became more widely established, entering the mainstream.
The fight against spam and phishing
For as long as spam emails have been sent, people have searched for new ways to stop them. The worst kind of spam is phishing , a criminal tactic where mass emails are used to lure victims into scams and online frauds.
Towards the end of the 20th Century, the internet began to spread beyond the academic and military circles in which it originated. Suddenly, developers became aware of a new problem; the email protocols that had been created so far had no way of verifying the identity of an email's sender. Anyone could pretend to be…well, anyone. It was a perfect platform for impersonation and fraud.
A solution was needed, if email was to survive. Between 1997 and 2003, new ideas began to emerge as to how a sender's domain could be verified. These included:
MS (Mail Sender DNS record)
MT (Mail Transmitter DNS record)
Repudiating MAIL FROM
RMX (Reverse MX)
DSP (Designated Senders Protocol)
In the end, it was an authentication method known as Sender Policy Framework (SPF) which was settled on in 2003. After numerous changes, it was documented in 2006 in RFC4408.
Building more advanced mail authentication systems
In 2004, Yahoo! and Cisco combined the protocols DomainKeys and Identified Internet Mail to create DomainKeys Identified Mail, better known as DKIM. It’s a method for signing emails, where the email server marks outgoing mail with a private RSA key and the recipient's server verifies signatures with a public key in the DNS TXT records. The protocol eventually replaced the RSA key with elliptic curve keys.
Neither SPF nor DKIM are perfect protocols. For one thing, neither system can forward mail without breaking authentication. SPF suffers from this in particular, while in DKIM’s case it’s most evident in mailing lists.
A decade after the invention of SPF and DKIM, people realized that these protocols alone weren't performing two much-needed functions:
Feedback: If someone sends phishing emails in your name, you want to know about it.
Enforcement: If you’re sending signed emails, you may want to block recipients from receiving unsigned ones.
While SPF arguably contains both of these mechanisms, they’re not really functional enough to be useful.
So DMARC was developed, and released in 2015. It combines SPF and DKIM, enforcing the domain owner's rules for mail delivery (receive phishing/send phishing to spam/trash phishing) and reporting statistics. To this day, however, respectable organizations using email still set up SPF, DKIM, and DMARC for their domains.
Different kinds of email
It’s not just people who send emails these days. From office printers and scanners to sophisticated automated chain letter systems, non-human activity makes up a large portion of all emails sent. Broadly, email can be divided into three categories:
Transaction emails: These are automated messages targeting one recipient, which send on a specific trigger. Sign up to a website? You get a confirmation. Buy something online? Here’s your invoice. Forgot your password? You’ll receive a password reset link. Typically, these emails are created on the organization's systems and sent through an intermediary (Sendgrid or Mailgun, for example).
Marketing emails: Nowadays, marketing emails should only be sent if the recipient has consented to them. This is a very broad category, covering everything from simple newsletters and personalized webstore offers to automated CRM chainmail and PR press releases. Modern organizations often make use of third-party systems (such as Mailchimp or Mailerlite). These services are different from transaction email intermediaries; emails can be created using their software, and they can auto-manage recipient lists on behalf of an organization.
Emails sent by people: These emails are written by real people, and are usually sent through third-party email services like GMail and Outlook. These premade email systems cover both sending and receiving mail, as well as spam filtering and other features (groups, mailing lists, antivirus, and more).
These email categories may overlap. A transaction email can have a marketing CTA. An automated marketing chain may be forwarded to a real person by a smart CRM. Replying to a marketing email may get you in touch with support staff, and so on.
How does email work?
In theory most modern email systems work like this:
The employees of an organization have a Mail User Agent (MUA) on their computers (software such as Outlook or The Bat!), which they can use to write and read emails.
When an employee has written their email, the message travels using SMTP to the organization's mail server.
The server performs Mail Submission Agent (MSA) and Mail Transfer Agent (MTA) functions — in practical terms, it receives the email from the original sender, queues it, finds the recipient's MTA server using DNS MXs, and sends the email to that server over the internet.
Another (or even the same) server performs the MTA function for incoming emails, storing them, and then performs the Mail Delivery Agent (MDA) function, which allows the user’s MUA to access and retrieve their emails.
This is a simplified model: in practice, it’s a little more complicated than that. Often, people use webmail (they don’t have a mail program, and so they read and write email in their browsers).
On the sender’s side, there are additional MTA steps for virus scans, user authentication, ratelimits, and more. On the recipient’s side, there are additional MTA servers for virus scanning and spam detection (often with recourse to third-party services). And then there are backup servers for when the primary ones fail.
The actual email model used by respectable organizations that operate online (providing services or selling something), looks more like this:
The arrows in this illustrations are red, blue, and green. Red arrows indicate internal mail as it flows between people or systems in an organization and the servers or systems that send mail. This may involve various protocols.
Green arrows show SMTP messages being received from the outside. There can be only one receiving location specified in DNS MX records per domain (excluding subdomains). You can have more MX records or more servers to service them, but in reality it will be a single shared system. If multiple systems need to receive mail, it’s either forwarded to them or they retrieve it themselves from specified inboxes via IMAP or POP3.
Blue is the most interesting color in the context of this piece, since it represents sent email. One domain may have multiple outgoing mail systems that know almost nothing about each other.
From “forluke.txt” to a global system
Email has come a long way since the 60s and 70s. It completely changed how we communicate, became a keystone for the modern internet, and remains one of the defining technologies of the 21st Century.
While it takes only a couple of seconds to compose a new email, there are a multitude of processes running in the background. Understanding where these systems came from and why they work so well now can help us improve them further, and reminds us to appreciate the years of work that brought us to this point.