Try the converter →

Reference Management for LaTeX Researchers

BibTeX Conversion,
Done Right

Getting references into BibTeX from DOIs, ArXiv preprints, PubMed IDs, ISBNs, and export files is where most researchers lose time — and where most converters quietly produce broken output. Here's what actually goes wrong, and how to avoid it.

@article{lecun2015deep,
  author = {LeCun, Yann and Bengio, Yoshua and Hinton, Geoffrey},
  title = {Deep Learning},
  journal = {Nature},
  year = {2015},
  volume = {521},
  pages = {436--444}
}
% citekey: lastname + year + first significant word

What BibTeX is and why researchers use it

BibTeX is the reference management format that powers citations in LaTeX. You store all your references in a .bib file — one entry per paper or book — and LaTeX pulls them into your document automatically, formatted to whichever style your journal or institution requires. Switch from APA to IEEE to Vancouver by changing a single line. No manual reformatting.

For anyone writing a thesis, journal paper, or conference submission in LaTeX, a clean .bib file is non-negotiable. The problem is that references rarely arrive in BibTeX format. They come as DOI links, ArXiv preprint IDs, PubMed IDs, ISBNs, or bulk-exported .ris and .nbib files from literature databases.

Converting these correctly — without silent errors that only surface when LaTeX tries to compile — is what separates a useful converter from a frustrating one.

Input formats and what to watch for in each

DOI → BibTeX

The DOI is the most reliable path to a complete BibTeX entry. A DOI like 10.1145/3442188.3445922 maps directly to a Crossref record that includes the full author list, title, journal, volume, issue, pages, and year. The DOI to BibTeX converter queries the Crossref API and converts the response to BibTeX in the browser — nothing is sent to a server. Paste up to 50 DOIs at once; they resolve in parallel.

One thing most converters miss: LaTeX's bibliography styles lowercase title words automatically. So a title like Analyzing COVID-19 with mRNA Techniques comes out as analyzing covid-19 with mrna techniques — silently wrong. Proper conversion wraps acronyms and initialisms in braces: {COVID-19}, {mRNA}, {DNA}, so they can't be lowercased. The same logic applies to special characters — & becomes \&, and % becomes \%, or your file won't compile.

50 DOIs at once, parallel Braces on acronyms (DNA, COVID-19, mRNA) Escapes & % $ # _ Override entry type per row Citekey: lastnameYearWord

ArXiv → BibTeX

ArXiv IDs come in two forms: the modern format like 2106.09685 and the old-style cs.LG/0612064. Both need to produce a correct @misc entry with eprint, eprinttype = {arXiv}, and eprintclass fields — the fields that biblatex uses to render the ArXiv link correctly. A bare @article without these fields is technically wrong for a preprint.

ArXiv's API has no CORS headers, which means client-side tools can't call it directly. The ArXiv to BibTeX converter routes requests through a lightweight proxy that disk-caches results (ArXiv records are immutable per version) and applies per-IP rate limiting so it doesn't hammer ArXiv's servers.

New-style and old-style IDs Full URL normalization eprint + eprinttype + eprintclass Proxy caches by ID

PubMed → BibTeX

PubMed (NCBI) has no CORS support, so most tools that claim to support PMIDs either proxy through a server that sees your IDs or don't work at all. The PubMed to BibTeX converter uses Europe PMC instead — it carries the same Medline data as NCBI with proper CORS support, so IDs never leave the browser except to go there.

NLM compressed page ranges are a specific problem: PubMed records often store pages as 436-44 meaning pages 436 through 444 in NLM shorthand. Most converters pass this through as-is, producing a broken page range in the output. Correct conversion expands it to 436--444. Similarly, when a bioRxiv preprint ID leaks into the pages field (a value like 2020.03.27.20044925), the field should be dropped rather than written as garbage data.

Europe PMC (CORS-friendly) NLM page range expansion (436-44 → 436--444) Full journal names, not NLM stubs PMID + DOI when both exist 50 at once, inline error per row

ISBN → BibTeX

Books are trickier than journal articles because metadata coverage varies significantly by source. Most free ISBN tools query only Open Library, which has gaps — particularly for non-English titles and recent or self-published books. A robust ISBN to BibTeX converter cascades across sources: Open Library's books API first, then Open Library search, then Google Books as a fallback. Only if all three miss does it report not found.

Open Library splits subtitles into a separate field. A title stored as Artificial Intelligence with subtitle A Modern Approach needs to be joined with a colon before going into the title field, since @book has no subtitle field. Open Library's publish_date also comes in multiple formats — "2023", "January 2023", "January 1, 2023" — requiring year extraction before writing the year field.

Three-source cascade Subtitle joining Year extraction from messy date strings ISBN-10 and ISBN-13, with or without hyphens

RIS, NBIB, and EndNoteXML

Bulk exports from PubMed, Scopus, Web of Science, and Mendeley come in .ris, .nbib, or .enw format. RIS and EndNoteXML are well-documented and handled by parsing libraries. NBIB is the exception — the PubMed-specific .nbib format is not supported by most citation parsing libraries, including citation.js. It requires a hand-rolled parser for the PMID-, TI, FAU field structure. For RIS and EndNote exports, the same downstream cleanup applies: braced acronyms, escaped special characters, consistent citekeys.

NBIB: custom parser (FAU, TI, PMID- fields) RIS: tagged-line parsing (TY, AU, JF) Multi-record files supported

Exporting BibTeX to other formats

Sometimes you need citations in a format other than BibTeX — for colleagues not using LaTeX, for importing into Word, or for a reference list in a web document. Common output formats and what to know about each:

BibTeX → citation styles (APA, MLA, Chicago, Harvard, IEEE, Vancouver)

Citation style rendering uses the CSL (Citation Style Language) pipeline: parse the BibTeX entry, convert to CSL-JSON internally, then render through the appropriate CSL template. The tools on The LaTeX Lab support APA, MLA, Chicago, Harvard, IEEE, AMA, and Vancouver. Vancouver in particular uses a numbered format with abbreviated author names (LeCun Y, not LeCun, Yann) that many free tools don't implement.

BibTeX → CSV and Excel

CSV export has one common failure point: encoding. Excel opens CSV files using the system locale, which means accented characters like Krämer get garbled unless the file has a UTF-8 BOM at the start. Most CSV exporters omit this. The Excel output produces a real .xlsx binary (not a renamed CSV) that opens correctly in Excel, Numbers, and Google Sheets without any import configuration.

BibTeX → JSON (CSL-JSON)

The JSON output is CSL-JSON — the format natively understood by Zotero, Pandoc, Quarto, and RMarkdown. This is distinct from arbitrary JSON representations of BibTeX fields that no other tool recognises. A CSL-JSON array dropped into Zotero's import dialog round-trips cleanly.

BibLaTeX vs legacy BibTeX: Every converter on The LaTeX Lab outputs in both dialects. Legacy BibTeX requires escaped accents (\"a for ü). BibLaTeX with Biber accepts raw UTF-8. Choose based on whether your document uses \bibliographystyle{} (legacy) or the biblatex package (modern).

Six BibTeX problems that cause silent errors

These issues won't always throw a LaTeX error — they produce wrong output that looks plausible until a reviewer catches it.

Title case getting destroyed Very common

Bibliography styles like APA and Chicago lowercase title words automatically. Any word that must stay capitalised — acronyms (DNA, COVID-19, mRNA), proper nouns, initialisms — needs to be wrapped in braces: {DNA}, {COVID-19}. Without this, your compiled bibliography silently renders them in lowercase.

Unescaped special characters Causes compile errors

The characters & % $ # _ have special meaning in LaTeX. If they appear in a title or journal name without being escaped (\& \% \$ \# \_), the document won't compile — often with an error message that points nowhere near the .bib file. Same with unmatched braces in author names.

Accented characters in legacy BibTeX Encoding issue

Raw UTF-8 characters like ü or é work fine in BibLaTeX with Biber, but break in legacy BibTeX which expects \"u and \'e. If your document uses \bibliographystyle{}, you need the escaped form. Most converters output only one or the other without letting you choose.

Wrong entry type from Crossref Incorrect citations

Crossref sometimes tags conference papers as journal-article, which maps to @article in BibTeX. A paper in NeurIPS or CVPR proceedings should be @inproceedings. The difference affects how the citation renders. A good converter lets you override the entry type per row before copying.

Duplicate cite keys Silent wrong citations

When merging entries from multiple sources, duplicate cite keys cause LaTeX to silently use only the first entry. The standard convention is lastnameYearWord (e.g. lecun2015deep), with automatic collision handling when two papers would otherwise share a key.

Broken page ranges from NLM exports PubMed-specific

NLM's compressed page range notation stores 436 through 444 as 436-44 — dropping the repeated hundreds digit. Most converters pass this through unchanged, producing a page range that renders as "436-44" in the bibliography instead of "436–444". Correct handling expands it during conversion.

BibTeX conversion tools on The LaTeX Lab

The LaTeX Lab offers a full suite of browser-based BibTeX converters — no sign-up, nothing uploaded to a server, 50 records per batch. Every tool applies the same downstream cleanup: braced acronyms, escaped reserved characters, consistent lastnameYearWord citekeys, both BibLaTeX and legacy BibTeX output.