Internet and e-mail policy and practice
including Notes on Internet E-mail


Click the comments link on any story to see comments or add your own.

Subscribe to this blog

RSS feed

Home :: Email

10 Jul 2011

Email in the World's Languages (Part III) Email

In our last installments we discussed the various ways to encode non-ASCII character sets, of which UTF-8 is the winner, and some complex approaches that tried to make UTF-8 mail backward compatible with ASCII mail. After years of experiments, the perhaps surprising consensus is that if you're going to do international mail, you just do it.

[legacy and EAI mail streams]

The revisions to RFC 5335 and 5336 currently nearing completion in the IETF's EAI working group remove most of the complexity from the existing design, and simply define a new international mail stream with UTF-8 text in mail headers, and UTF-8 addresses in the SMTP session. (MIME and the 8BITMIME ESMTP option already handle UTF-8 message bodies.) A mail server announces that it can handle internationalized mail by announcing an ESMTP option, which currently has the ugly placeholder name UTF8SMTPbis, but will presumably be changed to something snappier by the time the revised RFC is issued. I'll call it EAI here.

A mail client, if it sees that a server announces EAI capability, can put the EAI flag on the MAIL FROM command in the SMTP session, to tell the server that this message is an internationalized one, and then sends the message. Any mail server that supports EAI also supports 8BITMIME, so the message can contain any UTF-8 text without special coding. The RFC also defines EAI options for the SMTP commands EXPN and VRFY, which accept and return e-mail addresses, and some new extended return codes for status messages. If a client doesn't use the EAI option, it sends legacy 2821/2822 mail, same as always. The option is set or not for each message, so each message moving through the mail system is an EAI message or a legacy message.

The RFC for internationalized mail bodies allows UTF-8 nearly anywhere that ASCII can occur, other than in a few places intended to be parsed by computers, such as Message IDs and time zones in date stamps. It defines a new MIME body type, message/global, for attached or included internationalized mail messages. And that's about it. All of the down-conversion stuff to turn EAI messages into RFC 2821/2822 compatible mail is gone.

As the diagram above illustrates, this defines a new EAI (or whatever we call it) mail stream that is parallel to the existing legacy SMTP stream. If a message has UTF-8 headers, or a UTF-8 envelope address, it has to go in the EAI stream. If it doesn't it can go in either the legacy stream or the EAI stream.

The solid diagonal arrow shows that a legacy message can always move into the EAI stream, since the spec for the latter is a superset of the spec for the former. The dotted arrow shows that an EAI message can sometimes go into the legacy stream, if an MTA looks through the message and notices that it has an all ASCII body and an all ASCII envelope, something I've nicknamed Deep Message Inspection. (It's not required to do this, just allowed.)

If a client wants to send an EAI message to a server that doesn't handle EAI, too bad, there's no standardized way to downgrade.

This doesn't mean that mail systems are forbidden to downgrade messages, just that the IETF isn't trying to define a standard way to do so. It's easy to imagine useful special cases. For example, if an MSA (the program that accepts mail from user mail programs) is configured with backup legacy addresses for its EAI users, and it noticed EAI mail from one of its users to a legacy address, it could rewrite the message's headers to replace the user's EAI address with her legacy address and MIME encode the UTF-8 header text, turning the message headers into ASCII, and send it along as a legacy message. That sort of thing may turn out to be fairly popular, but at this point, it's not well enough understood to standardize.

There are mail communities where all the users read and write languages written in non-ASCII scripts, so it may turn out that everyone's mail supports EAI, and legacy compatibility doesn't matter. Or they may treat legacy mail as a special case, perhaps by using a different user mail program.

The EAI work has defined the core of a fully internationalized mail system, allowing users to send mail using their preferred language, encoded as UTF-8, in essentially all parts of e-mail messages. The EAI spec is conceptually straightforward, and the changes required for adequate if not superb support in MTAs are fairly simple. (Deep Message Inspection is harder, but it's optional.) The details of migrating from legacy mail to EAI mail are a work in progress, as are the details of how EAI users will send mail to legacy mail users, but the imminent completion of the EAI work is a big forward step for mail in all the world's languages.

posted at: 15:40 :: permanent link to this entry :: 0 comments
Stable link is


My other sites

Who is this guy?

Airline ticket info

Taughannock Networks

Other blogs

Dave Piscitello on Ransomware
52 days ago

A keen grasp of the obvious
My high security debit card
598 days ago

Related sites

Coalition Against Unsolicited Commercial E-mail

Network Abuse Clearinghouse

© 2005-2018 John R. Levine.
CAN SPAM address harvesting notice: the operator of this website will not give, sell, or otherwise transfer addresses maintained by this website to any other party for the purposes of initiating, or enabling others to initiate, electronic mail messages.