The Email Address That Was the Internet: HOSTS.TXT and the People Who Ran It

On January 19, 1983, eighteen days after the ARPANET’s mandatory switch to TCP/IP, a text file sat on a machine called OFFICE-1 at SRI International in Menlo Park. Anyone on the network could retrieve it by FTP with a username and password printed directly in the protocol specification: GUEST, ARPA. The file was called HOSTS.TXT and it was, at that moment, the address book of the internet.

The January 19 snapshot survives. It holds approximately 780 HOST entries across ARPANET, CHAOS, and several other connected networks. Each entry is a single line: a host name, one or more network addresses, an operational status, an operating system, a hardware type, and sometimes a list of informal nicknames the machine was known by locally. Some machines ran TOPS-20. Many ran variants of Unix on VAX hardware. A few still ran ITS or MULTICS. Reading the file is reading the whole internet — not a map of it, not a description of it, but the thing itself, in ASCII, fetched by anonymous FTP from a machine whose decimal address was 43.

OFFICE-1, decimal address 43

RFC 608, issued by M.D. Kudlick at SRI-ARC on January 10, 1974, specified the file precisely. The file would reside at the path <NETINFO>HOSTS.TXT on host OFFICE-1 at decimal network address 43. The FTP credentials appeared in the document itself: username GUEST, password ARPA. Not as a temporary measure pending a more secure arrangement — as the operational specification, to be read by anyone implementing a system that needed to retrieve the file.

RFC 608 listed Vint Cerf, Peter Deutsch, Jake Feinler, and Nancy Neigus as contributors. Feinler was at SRI’s Network Information Center and would run it for the next fifteen years. In 1974, the network she was helping to administer was small — research institutions, mostly, connected through government contracts and academic relationships. The GUEST/ARPA credentials were proportionate to the context. Getting onto the ARPANET required going through a vetting process before the question of FTP credentials ever arose; the public password on the file was not a security gap, it was a bureaucratic convenience for a community where the meaningful gatekeeping happened upstream.

The format RFC 608 specified was simple: a host name, a network address, and a status — SERVER for machines that provided resources to other hosts, USER for those that only connected to receive them, TIP for the terminal interface processors that gave dumb terminals network access, UNKNOWN for anything not yet classified. The design was machine-readable from the start. A program on your local machine could download the file and use it for name lookups; the whole system required no other infrastructure than FTP and a reliable place to put the file.

By October 1985, when RFC 952 superseded RFC 608, the format had grown to six explicit fields: keyword type, addresses, names, machine type, operating system, and a list of supported protocols. Host names could be up to twenty-four characters. RFC 952 was coauthored by Feinler with Ken Harrenstien and Mary Stahl. Its note about the future was direct: the NIC “will attempt to keep similar information for non-DoD networks and hosts…until intercommunicating network name servers are in place.” The DNS work was already underway; the document knew it was closing out a phase.

HOSTSMASTER@SRI-NIC.ARPA

Getting onto the ARPANET required emailing an address. Feinler describes the process in her first-person account of running the NIC: “BBN would forward IMP port information to the NIC. The NIC would contact the site for other information needed for the Host Table, and it would verify that the name chosen was not already in use and that it met network guidelines.” The NIC would then add the new entry to the next update of HOSTS.TXT, and the file would propagate to any machine on the network that chose to download it.

The HOSTSMASTER@SRI-NIC.ARPA address was the intake point for that correspondence. A small team at SRI handled it: incoming requests, name verification, conflict resolution, update production. Feinler ran that team across a period when the network grew from a handful of research machines to something large enough that the administrative load on the NIC had itself become a constraint. The work was not glamorous. It was checking names against the existing list, writing back to site administrators when something didn’t meet guidelines, maintaining the file’s internal consistency, and pushing updates on a regular cycle.

The exact update frequency isn’t established from primary sources; secondary accounts give a figure of a couple of times a week, but none of the contemporaneous documents confirms it.

What the system produced was a kind of bureaucratic texture that later internet access would not have. Getting a hostname required corresponding with a person. That person’s team would make a judgment — is this name already taken? Does it meet the guidelines? If there was a conflict, the NIC would resolve it. The centralization that made HOSTS.TXT eventually unsustainable was also what made it legible: there was one place where the authoritative record lived, one team that knew who was on the network, one address you could write to and expect an answer from. The network had, in other words, a clerk.

The enhanced version

The January 19, 1983 snapshot is not the official NIC distribution. It is the locally enhanced version maintained by administrators at MIT and Stanford, and the gap between the two is itself evidence about how the official system worked.

The MIT/Stanford version added information the NIC file didn’t carry: colloquial nicknames by which machines were known to local users, and additional hardware and operating system detail beyond what the NIC required. Site administrators needed this information to run their networks day to day, and since the official file didn’t provide it, they maintained their own extended copies that did.

January 19, 1983 is eighteen days past Flag Day, the mandatory TCP/IP switchover date of January 1. The snapshot shows the network in the first weeks of operation under its new protocol. Entries span ARPANET, CHAOS, LCS, RCC, and SUnet, among others. The internet was already larger than any one network, and the file was tracking it.

The locally enhanced copy is a document about the official one. It shows what site administrators actually needed from a host table — precision, currency, and completeness that the NIC’s publication cycle couldn’t reliably supply. The workaround was already in place. The problem it was solving was already real.

The table Goodfellow kept

Geoff Goodfellow ran his own host table, and by his account it worked better than the NIC’s. He describes the method in a 2020 post to the Internet Society’s internet history mailing list. On the NCP-era network, hosts announced themselves: the Network Control Program “known as NCP…would broadcast messages called RSTs to every possible host address on the network when they booted.” Goodfellow watched for new arrivals by checking netstat — looking for hosts that showed up as numbers only, without names. When he found one, he would telnet or FTP into it and read whatever hostname the machine’s own login prompt or FTP greeting announced. He was identifying new hosts as they came online rather than waiting for BBN to notify the NIC, the NIC to complete its verification cycle, and the NIC to issue an update.

He built the table for his own use, not for the network’s. What happened was that his table “became the preferred host table used by the majority of sites on The Net until the DNS came along.” The unofficial version had out-competed the official one on the dimension that mattered most for practical day-to-day operations: it was more current. If a machine had just come onto the network, Goodfellow’s table likely knew about it before the NIC’s file did.

The relationship between Goodfellow’s unofficial table, the MIT/Stanford enhanced version, and a third list — the Site Status reports published by Ellen Westheimer at BBN, documenting what BBN understood to be attached to each Interface Message Processor — is not resolved from available sources. Whether Westheimer’s list preceded HOSTS.TXT, complemented it, or competed with it is a question the available sources do not resolve.

What the three lists together suggest is that the official file had developed a parallel economy well before the scaling problem became acute. Sites needed host information more current and more complete than the NIC could officially provide; the community of administrators who used the official file also maintained alternatives to it. The workarounds were not circumventions. They were evidence that the official system, whatever its other virtues, was already running behind the network it was documenting.

Proportional to the square

RFC 1034, written by Paul Mockapetris at USC/ISI and issued in November 1987, is the document that replaced HOSTS.TXT by specifying DNS. Its second section explains why replacement was necessary, and the explanation is precise: “The total network bandwidth consumed in distributing a new version by this scheme is proportional to the square of the number of hosts in the network.”

The quadratic relationship is the structural reason no operational fix could have saved the file. When the network had fifty hosts, distributing an update meant fifty machines downloading the file — fifty transfers of the file’s current size. When it had five hundred, it meant five hundred transfers of a larger file, because more hosts meant more entries. The cost grew faster than the network grew.

RFC 1034 identified two other problems alongside the bandwidth constraint. Organizations had to wait for the NIC to change HOSTS.TXT to make their changes visible to the rest of the network — the control lag that Goodfellow’s table had been working around for years, and that the MIT/Stanford enhanced version had been compensating for locally. The third problem was structural: organizations wanted “some local structure on the name space” — the ability to manage their own segment of the name hierarchy rather than routing every addition through a central file at SRI.

The file had worked correctly for a decade. It continued to function as a secondary resource even after DNS came online. But it was structurally incapable of serving the network that the network was becoming. Mockapetris’s solution — a distributed system of name servers organized hierarchically, with no central file that anyone had to distribute to anyone — was the replacement the math required.

An afterthought, I believe

The naming of the top-level domains came out of the NIC. Feinler describes writing an email memo to Thomas Harris at the DCA Program Management Office, proposing that the top-level domains be organized as generic categories rather than a single .mil designation. Her proposal added .edu, .gov, .org, and .com alongside the military domain. The categories she chose were practical ones — educational institutions, government entities, non-profit organizations, commercial enterprises — which were the kinds of organizations the ARPANET had been serving throughout its operation.

On .com specifically, Feinler writes: “Adding a business or commercial category to the TLD naming scheme was an afterthought I believe.” Ken Harrenstien, who implemented the scheme, later remembered changing the proposed abbreviation from .bus to .com during implementation because it seemed like a better choice.

The people who maintained HOSTS.TXT named the internet. The team at SRI that had spent fifteen years verifying host names — checking them for uniqueness, corresponding with site administrators, resolving conflicts — proposed the system of suffixes that would organize machine names for the next four decades. The category they added as an afterthought, the one whose abbreviation changed during implementation, became the commercial internet’s primary identifying mark. The clerk who knew what was on the network named the categories that everyone who came after would use to find their way around it.

The file that replaced HOSTS.TXT is not a file. DNS is a distributed database, replicated across thousands of servers, answering billions of queries per day. No one downloads it. No one fetches it by FTP from a machine at decimal address 43 using credentials printed in the specification.

The January 19, 1983 snapshot is eighteen days past the point when the network changed protocols and started growing faster than any file could track. The file kept going. The math caught up with it eventually.