What is MHTML file extension?


MHTML is an archive file format that contains the content of a webpage. It stores HTML of a webpage, as well as linked resources on the webpage. The format can be created by different applications. These resources linked to webpages include images, applets, animation, audio files, etc. MHTML files can be opened through Internet Explorer, Opera, Chrome, Microsoft word, etc. Microsoft Windows use MHTML file format for recording scenarios of problems observed during the usage of any application on Windows. The MHTML format has the page content similar to specifications defined in message/RFC822 which is plain text email related details. The complete specifications of this format are explained by RFC 2557.

File format specifications

MHTML represents MIME encapsulation of aggregate HTML documents. These files are saved by web browser from an HTML file and are encoded within MIME encoding.

As per RFC 2557 specifications, the aggregate document is a MIME-encoded message containing the root resource as well as other resources. Such resources may be a representation of inline pictures, styles sheets, applets, etc. The complete specifications for MHTML file are detailed in RFC 2557 and this standard specifies that the body parts to be referenced can be identified by content –ID or by Content-Location.

MIME Content Headers

The first part of the file is email header and the second part is normally HTML code. The subsequent parts are additional resources represented in the form of URLs (Uniform Resource Locators) and encoded in base64 binary to text encoding. MIME content header is defined to resolve URI (Uniform Resource Identifiers) references to resources in other body parts. The header can occur in any message or content heading.

Content-Location Header

A content-location header is a representation of URI that labels the content of body parts where it is placed. It is utilized to label a resource that is not fetched by some or all recipients of a message. Every single message is permitted to have a single Content-Location header only.

Content-Type: multipart/related; boundary#"boundary-example";
                                  type#"text/html"

--boundary-example

Content-Type: text/html; charset#"US-ASCII"

... ... <IMG SRC#"fiction1/fiction2"> ... ...
... ... <IMG SRC#"cid:97116092811xyz@foo.bar.net"> ... ...

--boundary-example
Content-Type: image/gif
Content-ID: <97116092511xyz@foo.bar.net>
Content-Location: fiction1/fiction2

--boundary-example
Content-Type: image/gif
Content-ID: <97116092811xyz@foo.bar.net>
Content-Location: fiction1/fiction3

--boundary-example--
              

URIs of MHTML Aggregates

The URIs of MHTML aggregates are different that of their root URI. The Content-Location header filed first apply to the whole aggregate if it is used in heading of a multi-part/related heading. In the same way, the set of resources retrieved using Content-Locations of its parts when URI referring to MHTML aggregate is used to retrieve this aggregate.