What is a Excel (XLS/XLSX) File?


Files with XLS/XLSX represents Excel Binary File format. Such files are created by Microsoft Excel as well as other similar spreadsheet programs like OpenOffice Calc or Apple numbers. It contains one or more worksheets that store and display data in table format. Excel XLS file may store mathematical functions, charts, styles, and formatting. XLS file format was replaced by XLSX with the release of MS Excel 2007 version.

Brief History

XLS file is created by Microsoft Excel and is also known as Binary Interchange File Format (BIFF). This file type has introduced by the first time by making it part of Excel for Windows in 1987. XLS file format specifications were made public for the first time in June 2008 as Revision 1. After that, the specifications were updated and the latest revision is available in August 2018 as Revision 8.0.

Know the history of different versions of XLS File format –

Version 7.0 (released with Office 95) – This version of Excel was faster among all versions and internal stream rewrites were updated to 32 bits.

Version 8 (released with Office 97) – VBA was introduced as a standard language and removed natural language labels were incorporated in this version first time. A paper clip office assistant is also introduced first time in this version.

Version 9 (released with Office 2000) – Only minor changes were done in this version where paper clip office assistant could simultaneously hold multiple objects that were not in previous versions.

Version 10 (released with Office XP) – No noticeable improvements were done in this version.

Version 11 (released with Office 2003) – Introduction of new tables is there in this version.

XLS File format specifications

Data is arranged in XLS file as binary streams in the form of a compound file as described in MS – CFB. Data is stored in the compound file by using storages, streams, and substreams that contain information about the structure and content of a workbook. Every stream or substream has a series of binary records. Each record contains zero or more structured fields that contain workbook data.

Stream and Substream

A workbook is represented by the workbook stream. Every worksheet in the workbook is represented by Substreams. It includes various substreams like Macro Sheet Substream, Chart Sheet Substream, or Dialog Sheet Substream. Each binary stream or substream contains workbook data written in a series of binary records.

Record

The details about features in a workbook are stored as a record that is a variable-length sequence of bytes. This binary record had three main components –

Record Type: It is a two-byte unsigned integer that represents the type of information specified by the record and how the structure of the record data specific to this record is ordered and structured.

Record Size: It is a two-byte unsigned integer that represents the count of bytes that specifies the total size of the record data. It must be more than or equal to 0 and less than or equal to 8224.

Record Data: The record data contains fields that correspond to a particular record type and the remainder of the record. The size of record data must be equal to record size.

Cell Table

Cells are blocks of a workbook that store the content of the workbook like text, formulas, and numerical data. Cells maintain the record of stored data via data structure called Cell Table. The Cell Table store the sequence of records. It consists of row blocks where rows are arranged in row blocks. Each row block contains rows from the first row containing data to the last row containing data.

Every cell that contains individual cell formatting is represented by a record. Formatting linked with a cell can be taken from individual cell formatting, row formatting, column formatting, or the default cell format.

Formulas

A formula is a sequence of cell references, names, functions, or operators in a cell that together show a new value. They are stored in a tokenized representation called “parsed expressions.”

Charts

Excel support charts, graphs, and histograms produced from a specified group of cells. A chart is a graphic display of the sets of data in a visual form.

Metadata

Metadata provides extra data associated with a particular cell or its content.

Pivot Table

A pivot table is used to summarize the source data to get an overview of the distribution of that data. A Pivot Table has two major parts, a PivotCache and a PivotTable view.

Styles

Cell formatting contains several sets of properties –

  • Font Properties (font color, font size, bold, italic, etc.)
  • Fill Properties (background color, pattern, gradient, foreground color, etc.)
  • Alignment Properties (left, right, center alignment, etc.)
  • Border Properties (left, right, bottom, top, color, thin, thick, etc.)
  • Number Formatting Properties (date, time, number of decimal places, etc.)
  • Protection Properties (locked, hidden, etc.)