Network Working Group N. Freed Request for Comments: 2045 Innosoft Obsoletes: 1521, 1522, 1590 N. Borenstein Category: Standards Track First Virtual November 1996 Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies Status of this Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Abstract STD 11, RFC 822, defines a message representation protocol specifying considerable detail about US-ASCII message headers, and leaves the message content, or message body, as flat US-ASCII text. This set of documents, collectively called the Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages to allow for (1) textual message bodies in character sets other than US-ASCII, (2) an extensible set of different formats for non-textual message bodies, (3) multi-part message bodies, and (4) textual header information in character sets other than US-ASCII. These documents are based on earlier work documented in RFC 934, STD 11, and RFC 1049, but extends and revises them. Because RFC 822 said so little about message bodies, these documents are largely orthogonal to (rather than a revision of) RFC 822. This initial document specifies the various headers used to describe the structure of MIME messages. The second document, RFC 2046, defines the general structure of the MIME media typing system and defines an initial set of media types. The third document, RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text data in Freed & Borenstein Standards Track [Page 1] RFC 2045 Internet Message Bodies November 1996 Internet mail header fields. The fourth document, RFC 2048, specifies various IANA registration procedures for MIME-related facilities. The fifth and final document, RFC 2049, describes MIME conformance criteria as well as providing some illustrative examples of MIME message formats, acknowledgements, and the bibliography. These documents are revisions of RFCs 1521, 1522, and 1590, which themselves were revisions of RFCs 1341 and 1342. An appendix in RFC 2049 describes differences and changes from previous versions. Table of Contents 1. Introduction ......................................... 3 2. Definitions, Conventions, and Generic BNF Grammar .... 5 2.1 CRLF ................................................ 5 2.2 Character Set ....................................... 6 2.3 Message ............................................. 6 2.4 Entity .............................................. 6 2.5 Body Part ........................................... 7 2.6 Body ................................................ 7 2.7 7bit Data ........................................... 7 2.8 8bit Data ........................................... 7 2.9 Binary Data ......................................... 7 2.10 Lines .............................................. 7 3. MIME Header Fields ................................... 8 4. MIME-Version Header Field ............................ 8 5. Content-Type Header Field ............................ 10 5.1 Syntax of the Content-Type Header Field ............. 12 5.2 Content-Type Defaults ............................... 14 6. Content-Transfer-Encoding Header Field ............... 14 6.1 Content-Transfer-Encoding Syntax .................... 14 6.2 Content-Transfer-Encodings Semantics ................ 15 6.3 New Content-Transfer-Encodings ...................... 16 6.4 Interpretation and Use .............................. 16 6.5 Translating Encodings ............................... 18 6.6 Canonical Encoding Model ............................ 19 6.7 Quoted-Printable Content-Transfer-Encoding .......... 19 6.8 Base64 Content-Transfer-Encoding .................... 24 7. Content-ID Header Field .............................. 26 8. Content-Description Header Field ..................... 27 9. Additional MIME Header Fields ........................ 27 10. Summary ............................................. 27 11. Security Considerations ............................. 27 12. Authors' Addresses .................................. 28 A. Collected Grammar .................................... 29 Freed & Borenstein Standards Track [Page 2] RFC 2045 Internet Message Bodies November 1996 1. Introduction Since its publication in 1982, RFC 822 has defined the standard format of textual mail messages on the Internet. Its success has been such that the RFC 822 format has been adopted, wholly or partially, well beyond the confines of the Internet and the Internet SMTP transport defined by RFC 821. As the format has seen wider use, a number of limitations have proven increasingly restrictive for the user community. RFC 822 was intended to specify a format for text messages. As such, non-text messages, such as multimedia messages that might include audio or images, are simply not mentioned. Even in the case of text, however, RFC 822 is inadequate for the needs of mail users whose languages require the use of character sets richer than US-ASCII. Since RFC 822 does not specify mechanisms for mail containing audio, video, Asian language text, or even text in most European languages, additional specifications are needed. One of the notable limitations of RFC 821/822 based mail systems is the fact that they limit the contents of electronic mail messages to relatively short lines (e.g. 1000 characters or less [RFC-821]) of 7bit US-ASCII. This forces users to convert any non-textual data that they may wish to send into seven-bit bytes representable as printable US-ASCII characters before invoking a local mail UA (User Agent, a program with which human users send and receive mail). Examples of such encodings currently used in the Internet include pure hexadecimal, uuencode, the 3-in-4 base 64 scheme specified in RFC 1421, the Andrew Toolkit Representation [ATK], and many others. The limitations of RFC 822 mail become even more apparent as gateways are designed to allow for the exchange of mail messages between RFC 822 hosts and X.400 hosts. X.400 [X400] specifies mechanisms for the inclusion of non-textual material within electronic mail messages. The current standards for the mapping of X.400 messages to RFC 822 messages specify either that X.400 non-textual material must be converted to (not encoded in) IA5Text format, or that they must be discarded, notifying the RFC 822 user that discarding has occurred. This is clearly undesirable, as information that a user may wish to receive is lost. Even though a user agent may not have the capability of dealing with the non-textual material, the user might have some mechanism external to the UA that can extract useful information from the material. Moreover, it does not allow for the fact that the message may eventually be gatewayed back into an X.400 message handling system (i.e., the X.400 message is "tunneled" through Internet mail), where the non-textual information would definitely become useful again. Freed & Borenstein Standards Track [Page 3] RFC 2045 Internet Message Bodies November 1996 This document describes several mechanisms that combine to solve most of these problems without introducing any serious incompatibilities with the existing world of RFC 822 mail. In particular, it describes: (1) A MIME-Version header field, which uses a version number to declare a message to be conformant with MIME and allows mail processing agents to distinguish between such messages and those generated by older or non-conformant software, which are presumed to lack such a field. (2) A Content-Type header field, generalized from RFC 1049, which can be used to specify the media type and subtype of data in the body of a message and to fully specify the native representation (canonical form) of such data. (3) A Content-Transfer-Encoding header field, which can be used to specify both the encoding transformation that was applied to the body and the domain of the result. Encoding transformations other than the identity transformation are usually applied to data in order to allow it to pass through mail transport mechanisms which may have data or character set limitations. (4) Two additional header fields that can be used to further describe the data in a body, the Content-ID and Content-Description header fields. All of the header fields defined in this document are subject to the general syntactic rules for header fields specified in RFC 822. In particular, all of these header fields except for Content-Disposition can include RFC 822 comments, which have no semantic content and should be ignored during MIME processing. Finally, to specify and promote interoperability, RFC 2049 provides a basic applicability statement for a subset of the above mechanisms that defines a minimal level of "conformance" with this document. HISTORICAL NOTE: Several of the mechanisms described in this set of documents may seem somewhat strange or even baroque at first reading. It is important to note that compatibility with existing standards AND robustness across existing practice were two of the highest priorities of the working group that developed this set of documents. In particular, compatibility was always favored over elegance. Freed & Borenstein Standards Track [Page 4] RFC 2045 Internet Message Bodies November 1996 Please refer to the current edition of the "Internet Official Protocol Standards" for the standardization state and status of this protocol. RFC 822 and STD 3, RFC 1123 also provide essential background for MIME since no conforming implementation of MIME can violate them. In addition, several other informational RFC documents will be of interest to the MIME implementor, in particular RFC 1344, RFC 1345, and RFC 1524. 2. Definitions, Conventions, and Generic BNF Grammar Although the mechanisms specified in this set of documents are all described in prose, most are also described formally in the augmented BNF notation of RFC 822. Implementors will need to be familiar with this notation in order to understand this set of documents, and are referred to RFC 822 for a complete explanation of the augmented BNF notation. Some of the augmented BNF in this set of documents makes named references to syntax rules defined in RFC 822. A complete formal grammar, then, is obtained by combining the collected grammar appendices in each document in this set with the BNF of RFC 822 plus the modifications to RFC 822 defined in RFC 1123 (which specifically changes the syntax for `return', `date' and `mailbox'). All numeric and octet values are given in decimal notation in this set of documents. All media type values, subtype values, and parameter names as defined are case-insensitive. However, parameter values are case-sensitive unless otherwise specified for the specific parameter. FORMATTING NOTE: Notes, such at this one, provide additional nonessential information which may be skipped by the reader without missing anything essential. The primary purpose of these non- essential notes is to convey information about the rationale of this set of documents, or to place these documents in the proper historical or evolutionary context. Such information may in particular be skipped by those who are focused entirely on building a conformant implementation, but may be of use to those who wish to understand why certain design choices were made. 2.1. CRLF The term CRLF, in this set of documents, refers to the sequence of octets corresponding to the two US-ASCII characters CR (decimal value 13) and LF (decimal value 10) which, taken together, in this order, denote a line break in RFC 822 mail. Freed & Borenstein Standards Track [Page 5] RFC 2045 Internet Message Bodies November 1996 2.2. Character Set The term "character set" is used in MIME to refer to a method of converting a sequence of octets into a sequence of characters. Note that unconditional and unambiguous conversion in the other direction is not required, in that not all characters may be representable by a given character set and a character set may provide more than one sequence of octets to represent a particular sequence of characters. This definition is intended to allow various kinds of character encodings, from simple single-table mappings such as US-ASCII to complex table switching methods such as those that use ISO 2022's techniques, to be used as character sets. However, the definition associated with a MIME character set name must fully specify the mapping to be performed. In particular, use of external profiling information to determine the exact mapping is not permitted. NOTE: The term "character set" was originally to describe such straightforward schemes as US-ASCII and ISO-8859-1 which have a simple one-to-one mapping from single octets to single characters. Multi-octet coded character sets and switching techniques make the situation more complex. For example, some communities use the term "character encoding" for what MIME calls a "character set", while using the phrase "coded character set" to denote an abstract mapping from integers (not octets) to characters. 2.3. Message The term "message", when not further qualified, means either a (complete or "top-level") RFC 822 message being transferred on a network, or a message encapsulated in a body of type "message/rfc822" or "message/partial". 2.4. Entity The term "entity", refers specifically to the MIME-defined header fields and contents of either a message or one of the parts in the body of a multipart entity. The specification of such entities is the essence of MIME. Since the contents of an entity are often called the "body", it makes sense to speak about the body of an entity. Any sort of field may be present in the header of an entity, but only those fields whose names begin with "content-" actually have any MIME-related meaning. Note that this does NOT imply thay they have no meaning at all -- an entity that is also a message has non- MIME header fields whose meanings are defined by RFC 822. Freed & Borenstein Standards Track [Page 6] RFC 2045 Internet Message Bodies November 1996 2.5. Body Part The term "body part" refers to an entity inside of a multipart entity. 2.6. Body The term "body", when not further qualified, means the body of an entity, that is, the body of either a message or of a body part. NOTE: The previous four definitions are clearly circular. This is unavoidable, since the overall structure of a MIME message is indeed recursive. 2.7. 7bit Data "7bit data" refers to data that is all represented as relatively short lines with 998 octets or less between CRLF line separation sequences [RFC-821]. No octets with decimal values greater than 127 are allowed and neither are NULs (octets with decimal value 0). CR (decimal value 13) and LF (decimal value 10) octets only occur as part of CRLF line separation sequences. 2.8. 8bit Data "8bit data" refers to data that is all represented as relatively short lines with 998 octets or less between CRLF line separation sequences [RFC-821]), but octets with decimal values greater than 127 may be used. As with "7bit data" CR and LF octets only occur as part of CRLF line separation sequences and no NULs are allowed. 2.9. Binary Data "Binary data" refers to data where any sequence of octets whatsoever is allowed. 2.10. Lines "Lines" are defined as sequences of octets separated by a CRLF sequences. This is consistent with both RFC 821 and RFC 822. "Lines" only refers to a unit of data in a message, which may or may not correspond to something that is actually displayed by a user agent. Freed & Borenstein Standards Track [Page 7] RFC 2045 Internet Message Bodies November 1996 3. MIME Header Fields MIME defines a number of new RFC 822 header fields that are used to describe the content of a MIME entity. These header fields occur in at least two contexts: (1) As part of a regular RFC 822 message header. (2) In a MIME body part header within a multipart construct. The formal definition of these header fields is as follows: entity-headers := [ content CRLF ] [ encoding CRLF ] [ id CRLF ] [ description CRLF ] *( MIME-extension-field CRLF ) MIME-message-headers := entity-headers fields version CRLF ; The ordering of the header ; fields implied by this BNF ; definition should be ignored. MIME-part-headers := entity-headers [ fields ] ; Any field not beginning with ; "content-" can have no defined ; meaning and may be ignored. ; The ordering of the header ; fields implied by this BNF ; definition should be ignored. The syntax of the various specific MIME header fields will be described in the following sections. 4. MIME-Version Header Field Since RFC 822 was published in 1982, there has really been only one format standard for Internet messages, and there has been little perceived need to declare the format standard in use. This document is an independent specification that complements RFC 822. Although the extensions in this document have been defined in such a way as to be compatible with RFC 822, there are still circumstances in which it might be desirable for a mail-processing agent to know whether a message was composed with the new standard in mind. Freed & Borenstein Standards Track [Page 8] RFC 2045 Internet Message Bodies November 1996 Therefore, this document defines a new header field, "MIME-Version", which is to be used to declare the version of the Internet message body format standard in use. Messages composed in accordance with this document MUST include such a header field, with the following verbatim text: MIME-Version: 1.0 The presence of this header field is an assertion that the message has been composed in compliance with this document. Since it is possible that a future document might extend the message format standard again, a formal BNF is given for the content of the MIME-Version field: version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT Thus, future format specifiers, which might replace or extend "1.0", are constrained to be two integer fields, separated by a period. If a message is received with a MIME-version value other than "1.0", it cannot be assumed to conform with this document. Note that the MIME-Version header field is required at the top level of a message. It is not required for each body part of a multipart entity. It is required for the embedded headers of a body of type "message/rfc822" or "message/partial" if and only if the embedded message is itself claimed to be MIME-conformant. It is not possible to fully specify how a mail reader that conforms with MIME as defined in this document should treat a message that might arrive in the future with some value of MIME-Version other than "1.0". It is also worth noting that version control for specific media types is not accomplished using the MIME-Version mechanism. In particular, some formats (such as application/postscript) have version numbering conventions that are internal to the media format. Where such conventions exist, MIME does nothing to supersede them. Where no such conventions exist, a MIME media type might use a "version" parameter in the content-type field if necessary. Freed & Borenstein Standards Track [Page 9] RFC 2045 Internet Message Bodies November 1996 NOTE TO IMPLEMENTORS: When checking MIME-Version values any RFC 822 comment strings that are present must be ignored. In particular, the following four MIME-Version fields are equivalent: MIME-Version: 1.0 MIME-Version: 1.0 (produced by MetaSend Vx.x) MIME-Version: (produced by MetaSend Vx.x) 1.0 MIME-Version: 1.(produced by MetaSend Vx.x)0 In the absence of a MIME-Version field, a receiving mail user agent (whether conforming to MIME requirements or not) may optionally choose to interpret the body of the message according to local conventions. Many such conventions are currently in use and it should be noted that in practice non-MIME messages can contain just about anything. It is impossible to be certain that a non-MIME mail message is actually plain text in the US-ASCII character set since it might well be a message that, using some set of nonstandard local conventions that predate MIME, includes text in another character set or non- textual data presented in a manner that cannot be automatically recognized (e.g., a uuencoded compressed UNIX tar file). 5. Content-Type Header Field The purpose of the Content-Type field is to describe the data contained in the body fully enough that the receiving user agent can pick an appropriate agent or mechanism to present the data to the user, or otherwise deal with the data in an appropriate manner. The value in this field is called a media type. HISTORICAL NOTE: The Content-Type header field was first defined in RFC 1049. RFC 1049 used a simpler and less powerful syntax, but one that is largely compatible with the mechanism given here. The Content-Type header field specifies the nature of the data in the body of an entity by giving media type and subtype identifiers, and by providing auxiliary information that may be required for certain media types. After the media type and subtype names, the remainder of the header field is simply a set of parameters, specified in an attribute=value notation. The ordering of parameters is not significant. Freed & Borenstein Standards Track [Page 10] RFC 2045 Internet Message Bodies November 1996 In general, the top-level media type is used to declare the general type of data, while the subtype specifies a specific format for that type of data. Thus, a media type of "image/xyz" is enough to tell a user agent that the data is an image, even if the user agent has no knowledge of the specific image format "xyz". Such information can be used, for example, to decide whether or not to show a user the raw data from an unrecognized subtype -- such an action might be reasonable for unrecognized subtypes of text, but not for unrecognized subtypes of image or audio. For this reason, registered subtypes of text, image, audio, and video should not contain embedded information that is really of a different type. Such compound formats should be represented using the "multipart" or "application" types. Parameters are modifiers of the media subtype, and as such do not fundamentally affect the nature of the content. The set of meaningful parameters depends on the media type and subtype. Most parameters are associated with a single specific subtype. However, a given top-level media type may define parameters which are applicable to any subtype of that type. Parameters may be required by their defining content type or subtype or they may be optional. MIME implementations must ignore any parameters whose names they do not recognize. For example, the "charset" parameter is applicable to any subtype of "text", while the "boundary" parameter is required for any subtype of the "multipart" media type. There are NO globally-meaningful parameters that apply to all media types. Truly global mechanisms are best addressed, in the MIME model, by the definiti