What is PST file?


PST or Personal Storage Table is a proprietary file format by Outlook to store copy of mailbox items like emails, contacts, calendars, notes, etc. on user system from the account configured within Outlook. When users create an account, PST file is created automatically. PST files can be used to archive data locally on the user system.

PST Format Specifications

PST file specifications are available from Microsoft as free and irrevocable free patent licensing through Open Specification Promise.

Types of PST

PST file basically is of two types based on encoding of file type. ANSI encode PST is older file formats that is supported by Outlook 2002 and previous versions. These files have a maximum size limit of 2 GB and does not support Unicode. Unicode PST files are created after Outlook 2002 editions and they have a size limit of 50GB.

Logical Organization of PST File format

The overall structure of PST files contains three layers of data –

NDB Layer – Node Database Layer (NDB) is the lower layer of PST file and represents lower-level storage facilities of PST file format. NDB Layer consists of header, blocks, B-Tree structure, file allocation information, etc. Nodes and blocks of NDB layer are linked through data BID, one of the four properties of Node reference, i.e. NID (Node ID), Parent ID, Data BID (Block BID) and subNode BID.

LTP Layer – LTP Layer offers you high-level concepts on top of NDB layer. LTP layers contains Property Context (PC) and Table Context (TC). PC is a collection of properties and TC is two-dimensional matrix collection of properties.

To efficiently implement PC and TC, LTP layer uses two types of data structures –

Heap on Node (HN) – enable sub-allocating the data stream of a node into small and variable-sized fragments.

BTree on Heap (BTH) – It provides a practical way of searching through data PCs, are implemented as BTHs, and that is the reason of implementing by building inside an HN structure.

Messaging Layer – The Messaging layer is the top layer that uses the concepts created by LTP and NDB layer. This layer results as message objects, folder objects, attachments object and properties that are followed when modifying PST content, are defined in this layer

Physical Organization of PST File format

Header Information

The header structure of PST file is located at the starting of the file at 0 offset. It holds metadata information about PST file and the ROOT information to access NDB Layer data structures mentioned above. The header structure is different for both Unicode and ANSI versions of PST file.

The header begins with 4-bytes word !BDN represented by bytes (0x21, 0x42, 0x44, 0x4E). Another 2-bytes magic number, SM (0x53, 0x4D), is located at offset 8 from the beginning of the file. The version details lies at an offset of 10 from the beginning of the file. Hex value (0x17) shows Unicode PST file and 0x0E or 0x0F shows ANSI file format.

Fields Description
dwMagic (4 bytes) Must be “{ 0x21, 0x42, 0x44, 0x4E } ("!BDN”)”
dwCRCPartial (4 bytes) The 32-bit CRC value of the 471 bytes of data starting from wMagicClient (0ffset 0x0008)
wMagicClient (2 bytes) MUST be “{ 0x53, 0x4D }".
wVer(2 bytes) File format version. This value MUST be 14 or 15 for ANSI PST and 23 for Unicode PST file.
wVerClient (2 bytes) Client file format version. The version that corresponds to the format de in this document is 19. Creating a new PST file based on this document should start with value to 19.
bPlatformCreate (1 byte) This value MUST be set to 0x01.
bPlatformAccess (1 byte) This value MUST be set to 0x01.
dwReserved (8 bytes)
bidUnused (8 bytes Unicode only) Unused padding added when the Unicode PST file format was created.
bidNextP(Unicode: 8byte and ANSI-4 bytes) Next page BID. Pages have a special counter for providing bidIndex values. The value of bidIndex for BIDs for pages is provided from this counter
bidNextB (4 bytes ANSI only) Next BID. This value is the monotonic counter that represents the BID to be assigned for the next allocated block. BID values advance in increments of 4. For more details, see section 2.2.2.2.
dwUnique (4 bytes) This is a monotonically-increasing value that is modified every time the PST file’s HEADER structure is modified. The working of this value is to provide a unique value, and to ensure that the HEADER CRCs are different after each header modification.
rgnid[] (128 bytes) A fixed array of 32 NIDs, each corresponding to one of the 32 possible NID_TYPEs (NID_TYPE, NID_TYPE_NORMAL_FOLDER, NID_TYPE_SEARCH_FOLDER, NID_TYPE_NORMAL_MESSAGE,NID_TYPE_ASSOC_MESSAGE)
qwUnused (8 bytes) Unused space; MUST be set to zero. Unicode PST file format only.
root(Unicode OST -72 bytes and ANSI- 40 bytes) A ROOT structure (section 2.2.2.5)
dwAlign (4 bytes) Unused alignment bytes; MUST be set to zero. Unicode PST file format only.
rgbFM (128 bytes) Deprecated FMap. This is no longer used and MUST be filled with 0xFF. Readers SHOULD ignore the value of these bytes.
rgbFP (128 bytes) Deprecated FPMap. This is no longer used and MUST be filled with 0xFF. Readers SHOULD ignore the value of these bytes.
bSentinel (1 byte) MUST be set to 0x80.
bCryptMethod (1 byte) Shows how the data within the PST file is encoded. MUST be set any of the pre-defined values (NDB_CRYPT_NONE, NDB_CRYPT_PERMUTE, NDB_CRYPT_CYCLIC).
rgbReserved (2 bytes) Reserved; MUST be set to zero.
bidNextB (8 bytes) Show the next available BID value. Unicode PST file only.
bidNextB (Unicode ONLY: 8 bytes) Next BID. This value is the monotonic counter that indicates the BID to be assigned for the next allocated block. BID values advance in increments of 4. For more details, see section 2.2.2.2.
dwCRCFull (4 bytes) The 32-bit CRC value of the 516 bytes of data starting from wMagicClient to bidNextB, inclusive. Unicode PST file format only.
ullReserved (8 bytes) Reserved; MUST be set to zero. ANSI PST file format only.
dwReserved (4 bytes) Reserved; MUST be set to zero. ANSI PST file format only.
rgbReserved2 (3 bytes)
bReserved (1 byte)
rgbReserved3 (32 bytes)

Data Protection

PST files can be protected by password for security purposes that requires the application to apply a password before it is viewed. The password applied by users on PST file is stored in a message store. However, this password can be removed using available tools. Hence, there is no benefit of protecting data to be accessed by un-authorized parties. Storage of password as CRC-32 hash of the original string makes it a weak method for data security against brute-force approach.