Reference Management for LaTeX Researchers
BibTeX Conversion,
Done Right
Getting references into BibTeX from DOIs, ArXiv preprints, PubMed IDs, ISBNs, and export files is where most researchers lose time — and where most converters quietly produce broken output. Here's what actually goes wrong, and how to avoid it.
author = {LeCun, Yann and Bengio, Yoshua and Hinton, Geoffrey},
title = {Deep Learning},
journal = {Nature},
year = {2015},
volume = {521},
pages = {436--444}
}
% citekey: lastname + year + first significant word
What BibTeX is and why researchers use it
BibTeX is the reference management format that powers citations in LaTeX.
You store all your references in a .bib file — one entry per
paper or book — and LaTeX pulls them into your document automatically,
formatted to whichever style your journal or institution requires.
Switch from APA to IEEE to Vancouver by changing a single line.
No manual reformatting.
For anyone writing a thesis, journal paper, or conference submission in
LaTeX, a clean .bib file is non-negotiable. The problem is
that references rarely arrive in BibTeX format. They come as DOI links,
ArXiv preprint IDs, PubMed IDs, ISBNs, or bulk-exported .ris
and .nbib files from literature databases.
Converting these correctly — without silent errors that only surface when LaTeX tries to compile — is what separates a useful converter from a frustrating one.
Input formats and what to watch for in each
DOI → BibTeX
The DOI is the most reliable path to a complete BibTeX entry. A DOI like
10.1145/3442188.3445922 maps directly to a Crossref record
that includes the full author list, title, journal, volume, issue, pages,
and year. The DOI to BibTeX converter
queries the Crossref API and converts the response to BibTeX in the browser —
nothing is sent to a server. Paste up to 50 DOIs at once; they resolve
in parallel.
One thing most converters miss: LaTeX's bibliography styles lowercase
title words automatically. So a title like Analyzing COVID-19 with mRNA
Techniques comes out as analyzing covid-19 with mrna techniques
— silently wrong. Proper conversion wraps acronyms and initialisms in
braces: {COVID-19}, {mRNA}, {DNA},
so they can't be lowercased. The same logic applies to special characters —
& becomes \&, and
% becomes \%, or your file won't compile.
ArXiv → BibTeX
ArXiv IDs come in two forms: the modern format like 2106.09685
and the old-style cs.LG/0612064. Both need to produce a
correct @misc entry with eprint,
eprinttype = {arXiv}, and eprintclass fields —
the fields that biblatex uses to render the ArXiv link
correctly. A bare @article without these fields is technically
wrong for a preprint.
ArXiv's API has no CORS headers, which means client-side tools can't call it directly. The ArXiv to BibTeX converter routes requests through a lightweight proxy that disk-caches results (ArXiv records are immutable per version) and applies per-IP rate limiting so it doesn't hammer ArXiv's servers.
PubMed → BibTeX
PubMed (NCBI) has no CORS support, so most tools that claim to support PMIDs either proxy through a server that sees your IDs or don't work at all. The PubMed to BibTeX converter uses Europe PMC instead — it carries the same Medline data as NCBI with proper CORS support, so IDs never leave the browser except to go there.
NLM compressed page ranges are a specific problem: PubMed records often
store pages as 436-44 meaning pages 436 through 444 in
NLM shorthand. Most converters pass this through as-is, producing a
broken page range in the output. Correct conversion expands it to
436--444. Similarly, when a bioRxiv preprint ID leaks into
the pages field (a value like 2020.03.27.20044925), the
field should be dropped rather than written as garbage data.
ISBN → BibTeX
Books are trickier than journal articles because metadata coverage varies significantly by source. Most free ISBN tools query only Open Library, which has gaps — particularly for non-English titles and recent or self-published books. A robust ISBN to BibTeX converter cascades across sources: Open Library's books API first, then Open Library search, then Google Books as a fallback. Only if all three miss does it report not found.
Open Library splits subtitles into a separate field. A title stored as
Artificial Intelligence with subtitle A Modern Approach
needs to be joined with a colon before going into the title
field, since @book has no subtitle field. Open Library's
publish_date also comes in multiple formats — "2023",
"January 2023", "January 1, 2023" — requiring year extraction before
writing the year field.
RIS, NBIB, and EndNoteXML
Bulk exports from PubMed, Scopus, Web of Science, and Mendeley come in
.ris, .nbib, or .enw format.
RIS and EndNoteXML are well-documented and handled by parsing libraries.
NBIB
is the exception — the PubMed-specific .nbib format is not
supported by most citation parsing libraries, including citation.js. It
requires a hand-rolled parser for the PMID-, TI,
FAU field structure. For
RIS
and EndNote
exports, the same downstream cleanup applies: braced acronyms, escaped
special characters, consistent citekeys.
Exporting BibTeX to other formats
Sometimes you need citations in a format other than BibTeX — for colleagues not using LaTeX, for importing into Word, or for a reference list in a web document. Common output formats and what to know about each:
BibTeX → citation styles (APA, MLA, Chicago, Harvard, IEEE, Vancouver)
Citation style rendering uses the CSL (Citation Style Language) pipeline: parse the BibTeX entry, convert to CSL-JSON internally, then render through the appropriate CSL template. The tools on The LaTeX Lab support APA, MLA, Chicago, Harvard, IEEE, AMA, and Vancouver. Vancouver in particular uses a numbered format with abbreviated author names (LeCun Y, not LeCun, Yann) that many free tools don't implement.
BibTeX → CSV and Excel
CSV export has one common failure point: encoding. Excel opens CSV files
using the system locale, which means accented characters like
Krämer get garbled unless the file has a UTF-8 BOM at the start.
Most CSV exporters omit this. The Excel output produces a real
.xlsx binary (not a renamed CSV) that opens correctly in
Excel, Numbers, and Google Sheets without any import configuration.
BibTeX → JSON (CSL-JSON)
The JSON output is CSL-JSON — the format natively understood by Zotero, Pandoc, Quarto, and RMarkdown. This is distinct from arbitrary JSON representations of BibTeX fields that no other tool recognises. A CSL-JSON array dropped into Zotero's import dialog round-trips cleanly.
BibLaTeX vs legacy BibTeX: Every converter on
The LaTeX Lab
outputs in both dialects. Legacy BibTeX requires escaped accents
(\"a for ü). BibLaTeX with Biber accepts raw UTF-8.
Choose based on whether your document uses \bibliographystyle{}
(legacy) or the biblatex package (modern).
Six BibTeX problems that cause silent errors
These issues won't always throw a LaTeX error — they produce wrong output that looks plausible until a reviewer catches it.
Bibliography styles like APA and Chicago lowercase title words
automatically. Any word that must stay capitalised — acronyms (DNA,
COVID-19, mRNA), proper nouns, initialisms — needs to be wrapped in
braces: {DNA}, {COVID-19}. Without this,
your compiled bibliography silently renders them in lowercase.
The characters & % $ # _ have special meaning in
LaTeX. If they appear in a title or journal name without being escaped
(\& \% \$ \# \_), the document won't compile —
often with an error message that points nowhere near the .bib
file. Same with unmatched braces in author names.
Raw UTF-8 characters like ü or é work fine in
BibLaTeX with Biber, but break in legacy BibTeX which expects
\"u and \'e. If your document uses
\bibliographystyle{}, you need the escaped form. Most
converters output only one or the other without letting you choose.
Crossref sometimes tags conference papers as
journal-article, which maps to @article in
BibTeX. A paper in NeurIPS or CVPR proceedings should be
@inproceedings. The difference affects how the citation
renders. A good converter lets you override the entry type per row
before copying.
When merging entries from multiple sources, duplicate cite keys cause
LaTeX to silently use only the first entry. The standard convention is
lastnameYearWord (e.g. lecun2015deep), with
automatic collision handling when two papers would otherwise share a key.
NLM's compressed page range notation stores 436 through 444
as 436-44 — dropping the repeated hundreds digit. Most
converters pass this through unchanged, producing a page range that
renders as "436-44" in the bibliography instead of "436–444". Correct
handling expands it during conversion.
BibTeX conversion tools on The LaTeX Lab
The LaTeX Lab offers
a full suite of browser-based BibTeX converters — no sign-up, nothing
uploaded to a server, 50 records per batch. Every tool applies the same
downstream cleanup: braced acronyms, escaped reserved characters,
consistent lastnameYearWord citekeys, both BibLaTeX and
legacy BibTeX output.