posted
3/13/2007

BibTeX format trivia

I had the following observations regarding the formatting details of BibTeX database files as I was writing a script to format these files. The script was abandoned later in favor of a tool I found that was developed by Nelson Beebee. But the observations are still valid.

Between entries

Everything is treated as plain text comment and is ignored. Nothing with cause trouble.

However, a sequence of ' @ singleword {' will be assumed to be lauching an entry and will cause trouble, if not properly finished.

In the entire bibTeX file, spaces and blank lines behave identically. Neither will cause trouble. When I say 'blanks' below, I mean both unless otherwise noted.

Entries '@entrytype{key, fields }'

The entry can be surrounded by text, with or without spaces seperating out the entry.

Obviously, 'entrytype' needs to be a word (i.e., un-interrupted).

All five components, '@', 'entrytype', '{', 'key' and ',', can be seperated in-between by blanks.

However, VIM's coloring has trouble if '@' is preceded by non-blanks on the same line or 'entrytype' and '{' are separated or 'key' and ',' are separated.

Between fields

Fields are separated by a single ','. This separating ',' can have blanks before and after it, but they are not required. If a field is not followed by ',', everything that follows in the same entry is ignored. A ',' after the last field of the entry is optional.

Any non-blanks (other than ',') between fields will make whatever follows in the entry ignored. The LaTeX commen symbolt '%' does not help.

Fields 'fieldname = fielddata'

Blanks surrounding '=' are optional. However, VIM's coloring has trouble if '=' does not follow 'fieldname' on the same line.

'fielddata' should be enclosed in properly paired double quotes or '{ ... }'. Single quotes can not be used for this purpose.

If 'fielddata' is a single number, e.g. 1995, the enclosure is optional. Page range like '417-432' is not a single number.

'fielddata' can be a single word abbreviation, not enclosed, that has been defined by '@STRING{...}'. Abbreviations can only be used this way, that is, standing alone without quotations as the whole data of a field. Abbreviations are typically used for journal titles but can be used for other fields just as well.

Within a field

Blank lines, just like spaces, do not start and finish paragraphs. To force line break, use '\newline' or '\\'. To create a blank line, use '\newline\newline'. To create two blank lines, use '\newline\newline\newline'. And so forth. Of course this should be used only in fields 'abstract', 'comment', etc., although it is effective in other fields as well.

Math displays enclosed by
'\[ ... \]'
or
'\begin{equation} ... \end{equation}'
work as expected. They do not need the help of '\newline'.

Math mode enclosed by '$ ... $' works as expected.

LaTeX commands like '\textbf{ ... }' work as expected.

'{' and '{' are grouping symbols and have to be properly paired. They cannot be excaped by using '\{' or "{" or $\{$. I haven't found a way to make them ordinary characters.

'\' and '/' are special. I haven't found a way to escape them.

']' and ']' (not '\]' and '\[') are ordinary characters. So are '(' and ')'.

If the field uses '{' and '}' for overall enclosure, then double and single quotation marks can be used freely. Their proper pairing and nesting are required by language, but not by the bibTeX format or LaTeX.

If the field uses duoble quotes, ", for enclosing, then within the field single quotes and LaTeX quotes (`` and '') can be used freely. Any other double quote, ", will finish the field. Escape the double quote by put it in a group: {"}.

My format recommendations

Entry

either
^@entrytype{key,$
or
^@entrytype{$
^key,$

Field

^ *fieldname *= *{
},$