How should Papers export/import BibTeX libraries?

jowens's Avatar

jowens

24 Mar, 2011 03:56 AM

As a frequent paper author and reviewer, I take my BibTeX quite seriously. Papers 2.0 is the first version that is a good enough candidate for exporting bibtex. I've been sending boatloads of suggestions to the Papers developers in an effort to improve it. They asked that we discuss bibtex export needs here to try to get a consensus. I haven't dealt with bibtex import, but if that's relevant, let's discuss it here too.

Here's some issues that I definitely urge the developers to fix:

  • Titles are wrapped in {{}}. They need to be wrapped in only one {}. It would be desirable if it was possible to mark characters as always capitalized, so they could get wrapped in their own {}. Example:

    Current: title = {{Using GPUs to Improve Multigrid Solver Performance on a Cluster}},

    Should be: title = {Using {GPU}s to Improve Multigrid Solver Performance on a Cluster},

    "publisher" is also wrapped in {{}}, which is totally unnecessary since it's ignored anywhere except "title".

  • Months should always be three-letter abbreviations not wrapped in anything, since those abbreviations are built into bibtex and then translate properly to other languages if necessary (and allow the bib style to select if it wants Sept., September, 9, etc.).

    Current: month = {sep},

    Should be: month = sep,

    Ideally, Papers should also support month/month (month = mar # "\slash " # apr) and day-month (month = "2~" # dec).

  • @incollection has multiple issues: no way to mark the book title (what exports as "publisher" is what I would expect to be "booktitle"); no way to mark the chapter number; no way to mark the book volume number.

  • Putting parentheses in an author name is broken. Poor "Deng, Yangdong (Steve)" gets corrected to "{Deng, Yangdong {Steve}" (note the braces, and the lack of a closing brace).

  • url is currently broken, since all fields have to be either numbers, defined abbreviations, wrapped in {}, or wrapped in "", and since it's wrapped in \url{} not {\url{}}, it breaks bibtex. See below for more on this though (what is the right thing to do).

  • Any time two initials are exported, they need to be separated. "Bethel, EW" means poor Wes's first name is "EW", which it's not. "Bethel, E.W." is also not right - separate the E. and W. with a space.

  • Support both @phdthesis and @mastersthesis.

Things I'd like:

  • In general, importing from ACM and IEEE into bibtex loses both first names (instead replacing them with initials) and periods after initials.

  • Customizable citation formats (my field uses Lastname:YYYY:ABC, where ABC are the first three letters in the first three words in the title)

  • Fixing ACM and IEEE exports (capitalization in venue titles, weird formatting of venue titles, making the exported title match the capitalization of the title on the paper, etc.)

  • If the URL is actually a DOI, put that in the "doi" field instead. Right now Papers is (correctly) coding these as just the right half of the DOI (no http://, no dx.doi.org).

Things I'm not sure about:

  • Formatting of urls. Right now the export uses url = \url{http://...}. I know that won't work because everything has to be wrapped in a {} or a "", so "\url{http://...}" would make more sense. My feeling is that we should NOT be asking for \url{} though, instead just citing the URL without wrapping it, since most bib styles that use the url field will wrap it themselves. But this is a good topic for discussion.

  • Formatting of local_url. Right now these have %s in the titles, which appears to work correctly on my latex installation, which is good. Nice to get feedback on others' setups though. My current thought is that url should be formatted exactly as local_url is now.

There are other things that will surely bug me once I actually get started using this export to write papers. Anyway, hoping that we can get some good discussion going on what the right thing to do is so we can give guidance to the Papers developers.

  1. 32 Posted by jowens on 27 May, 2011 04:47 AM

    jowens's Avatar

    @joelfrederico: Boy, I feel sorry for you if a publisher is mandating a particular bib style then giving you a .bst that doesn't obey the style. If you don't get one, makebst usually does the job (at least for me).

    I gotta think that if the problem is varying bibliography requirements across papers, changing which .bst you use (a one-line change in a tex file) is WAY easier than manually editing dozens of bib entries. That's certainly my personal experience.

  2. 33 Posted by Bas on 05 Jul, 2011 11:05 AM

    Bas's Avatar

    Hi,
    Could you add the URL to the "Standard" Bibtex record for 'webpage' entries when exporting to Bibtex? Now I have to use "Complete" Bibtex record, and that also includes DOI and ISBN that I don't want in my Bibliography...

    Thanks!

  3. Support Staff 34 Posted by charles on 05 Jul, 2011 07:25 PM

    charles's Avatar

    Could you add the URL to the "Standard" Bibtex record for 'webpage' entries when exporting to Bibtex? Now I have to use "Complete" Bibtex record, and that also includes DOI and ISBN that I don't want in my Bibliography...

    I guess that makes sense. It should be Papers 2.0.9, barring any unforeseen issues.

    url is currently broken, since all fields have to be either numbers, defined abbreviations, wrapped in {}, or wrapped in "", and since it's wrapped in \url{} not {\url{}}, it breaks bibtex.

    In Papers 2.0.9, we will have '{\url{http://...}}

    Sorry there are still issues that are not addressed in this thread, they're just harder to tackle and we'll get to them at some point. Thanks for your patience!

  4. 35 Posted by jowens on 06 Jul, 2011 04:21 AM

    jowens's Avatar

    > > url is currently broken, since all fields have to be either numbers, defined abbreviations, wrapped in {}, or wrapped in "", and since it's wrapped in \url{} not {\url{}}, it breaks bibtex.
    >
    > In Papers 2.0.9, we will have '{\url{http://...}}

    Definitely better than what's there, but I currently lean toward no
    \url wrapping at all as the right thing to do. Other opinions?

    JDO

  5. Support Staff 36 Posted by charles on 06 Jul, 2011 04:59 AM

    charles's Avatar

    Definitely better than what's there, but I currently lean toward no \url wrapping at all as the right thing to do. Other opinions?

    I realize it's a very crude approach, but this came from checking the first few hits in a Google search for 'BibTeX URL':

    http://www.tex.ac.uk/cgi-bin/texfaq2html?label=citeURL
    ftp://ftp.tex.ac.uk/tex-archive/biblio/bibtex/contrib/doc/btxFAQ.pdf
    http://stackoverflow.com/questions/1425235/how-to-add-a-url-to-a-la...

    If having it does not hurt, then we'd rather have it to support more users. What's the downside?

    Many thanks for your feedback!

  6. 37 Posted by Bas on 06 Jul, 2011 09:10 AM

    Bas's Avatar

    @charles: thanks for the feedback and even more thanks for fixing this so quickly! Any indication when 2.0.9 arrives?

  7. 38 Posted by patman on 06 Jul, 2011 01:35 PM

    patman's Avatar

    Since you asked last time, an example for an article where the bibtex export irrecoverably messes up the title when there are mathematical formulae in it is:
    http://www.ams.org/mathscinet/search/publications.html?pg1=MR&s1=MR1327495

    The correct title for bibtex would be
    TITLE = {H\"older inequalities and sharp embeddings in function spaces of {$B^s_{pq}$} and {$F^s_{pq}$} type},

    Papers2 exports it to Bibtex as
    title = {{H{\"o}lder inequalities and sharp embeddings in function spaces of {\$}B^s{\_}pq{\$} and {\$}F^s{\_}pq{\$} type}},

    Which is bad for two reasons. The $ signs as well as the underscores get escaped, even though they are vital for correct LaTeX-rendering of the title. This could be at least partially fixed by post-processing the entry. Much worse, however, is the fact that the braces around the pq are missing! It is thus irrecoverable if the original meaning was
    B^s_{pq} (both p and q are subscript to B) or
    B^s_pq (only p is subscript, the q is a regular letter after the B).

    It would be great if this could be fixed...

  8. Support Staff 39 Posted by charles on 06 Jul, 2011 05:17 PM

    charles's Avatar

    Any indication when 2.0.9 arrives?

    @Bas: I can't say at this point, I am sorry. Please send me an email at charles at mekentosj dot com, I may be able to send you an intermediate build if you want to test these things.

  9. Support Staff 40 Posted by charles on 06 Jul, 2011 05:19 PM

    charles's Avatar

    @jowens: sorry maybe my previous email was not clear, but just to clarify, please let me know what is the negative impact of having the '\url' command, that would help evaluate the cons and pros of it :-)

  10. Support Staff 41 Posted by charles on 06 Jul, 2011 05:20 PM

    charles's Avatar

    @patman I can't access the article on MathSciNet and I am not entirely sure how the title string looks before 'texifying' it? How does it look in the Papers inspector?

  11. 42 Posted by patman on 06 Jul, 2011 06:44 PM

    patman's Avatar

    @charles In Papers2, the title of the article looks like this: Hölder inequalities and sharp embeddings in function spaces of $B^s_pq$ and $F^s_pq$ type, which is exactly what is output when the exported LaTeX is compiled - it's just not the title of the article. I'm not sure how exactly Papers2 gets the title. The article is matched with MathSciNet, which can export perfect LaTeX.

  12. Support Staff 43 Posted by charles on 06 Jul, 2011 06:56 PM

    charles's Avatar

    @patman

    If I understand correctly, the title in Papers 2 starts as:

       Hölder inequalities and sharp embeddings in function spaces of $B^s_pq$ and $F^s_pq$ type
    

    And ideally it should become:

        TITLE = {H\"older inequalities and sharp embeddings in function spaces of {$B^s_{pq}$} and {$F^s_{pq}$} type}
    

    So at this stage, Papers2 would need to detect the LaTeX syntax and correct it? Or better, we should find a way to properly get the title from the repository.

    Is that a correct summary of the situation?

    Thanks!

  13. 44 Posted by patman on 06 Jul, 2011 07:11 PM

    patman's Avatar

    Yeah, that's right. One thing though: Once the title is in the form that Papers saves it in (top one), then the correct title is no longer recoverable, since it's not clear if there were braces around the {pq} or not, which results in a different output.

    What I don't know is how Papers gets the title from MathSciNet - that repository gives several output options, amongst them a bibtex entry. They all seem to give back the second string (the correct LaTeX) as title, Papers2 changes that on import, though (escapes the $ and _, removes the braces, maybe more that I didn't see yet).

  14. Support Staff 45 Posted by charles on 06 Jul, 2011 07:26 PM

    charles's Avatar

    @patman: Thanks very much for the follow-up. Then indeed, the only option will be to fix the import step itself, if possible. Is it possible to edit the title yourself, as an alternative for now?

  15. 46 Posted by jowens on 06 Jul, 2011 07:36 PM

    jowens's Avatar

    @patman: If what Papers is getting as input is what Charles said in #43, then there's no way Papers can recover the subscript braces. A good heuristic for Papers, if it wanted to guess, would be "if I see a subscript or superscript, then put braces around everything following that sub/superscript until I see whitespace/$", and that's going to be right more often than not.

    @Charles: You have to bracket special characters ({\"o} not \"o, so H{\"o}lder), see BibTEX Tips and FAQ, 2.1.4. Will get to your other queries as soon as I can.

  16. 47 Posted by John Parejko on 06 Jul, 2011 07:44 PM

    John Parejko's Avatar

    As I said above at #14, it would help if Papers could just directly import the BibTeX records from places that have them (like NASA ADS). This solves these formatting problems because the BibTeX would be exactly formatted in the preferred manner for those journals, with all the necessary fields and abbreviations, and no more. This is the sort of feature that would convince me to upgrade to Papers 2.

  17. 48 Posted by patman on 06 Jul, 2011 08:17 PM

    patman's Avatar

    @jowens I'm pretty sure what papers gets from the repository (MathSciNet in this case) is correctly formatted LaTex - it then modifies it and saves it as said in #43. Your suggestion for guessing should of course yield the correct result most of the time. I might implement this in my bibtex post-processor.

    Also, the \"o does not necessarily have to be put in braces - it is better to do it, though, and for some other special characters it /is/ necessary. The original title-string from MathSciNet doesn't do it - and MathSciNet usually has perfect bibtex formatting.

    @John I agree, it would be best if the items in the original bibtex entry could be preserved somehow and used when exporting where appropriate. That would be fantastic, actually. I do want to stress that there are some other issues that are more urgent to address: search tokens and overall stability of the program, for example.

  18. 49 Posted by jowens on 06 Jul, 2011 08:41 PM

    jowens's Avatar

    @patman: My bib programs also do fine without the {}, but I gather all do not, and I think it's most sensible if Papers does the safest thing and includes the braces. (The "BibTeX Tips and FAQ" unequivocally recommends the braces.)

  19. 50 Posted by jowens on 06 Jul, 2011 10:26 PM

    jowens's Avatar

    @charles #36: I distinguish the references that you presented from what I'm asking for by the name of the field. Both want to know "how do I put a URL into the bibtex entry", but the references you presented are generally referring to how to do it by using "howpublished" or "note". Both of those entry fields expect text, so \url{} is appropriate to wrap URLs.

    However, Papers very definitely is exporting a URL, so the "url" field is appropriate, and we can be sure that it will contain a URL and nothing else. So then the question becomes "how do bib styles interpret a 'url' field?". If we assume bib styles wrap the URL in \url{}, then we should put it in without any wrapping (otherwise LaTeX will output a url that looks like \url{...}, and that's bad). If we assume the bib style does not wrap the url, then we need to wrap it manually, which breaks a LaTeX build if you're using weird characters in your url like an underscore. Damned if you do, damned if you don't.

    Unencumbered by what bib styles actually do, I argue that leaving out the \url{} is more elegant; it's simpler for all involved.

    So, I'm looking through the .bst files that are in TeXLive 2011, at least the ones I've heard of.

    • IEEETran: assumes URLs are naked (not wrapped in \url); it inserts \url [makebst]
    • achemso: assumes URLs are naked (not wrapped in \url); it inserts \url
    • aichej: assumes URLs are naked (not wrapped in \url); it inserts \url [makebst]
    • amsra: appears to require \url{} wrapping by user (hard to tell, I'm not great at .bst)
    • apacite: assumes URLs are naked (not wrapped in \url); it inserts \url
    • JAmChemSoc: assumes URLs are naked (not wrapped in \url); it inserts \url [makebst]
    • jae: assumes URLs are naked (not wrapped in \url); it inserts \url [makebst]
    • abbrvnat: assumes URLs are naked (not wrapped in \url); it inserts \url
    • naturemag: assumes URLs are naked (not wrapped in \url); it inserts \url

    I also note the ones generated from "makebst" accordingly; several are.

    I conclude:

    • Most bib styles that handle a "url" field want naked (unwapped) URLs in that field, and wrap them with \url within the style
    • makebst appears to support this interpretation also
    • Thus Papers should do the same.
  20. Support Staff 51 Posted by charles on 06 Jul, 2011 11:55 PM

    charles's Avatar

    @jowens: awesome, looks like a pretty solid argument! The point that's not entirely clear to me, if you can clarify, is what happens with these bst when you leave the \url? Does that mean they output the whole \url{...} string instead of just the url inside the command? In othher word, they output \url{\url{http://...}}?

    Also, does that mean, that 'uri' should keep using the \url, then?

  21. 52 Posted by jowens on 07 Jul, 2011 12:01 AM

    jowens's Avatar

    Here's an IEEEtran example where I manually put url="\url{http://...}":

    https://skitch.com/jowens/fex91/biburl.pdf-1-page

    So yes, if you leave the \url in the url field, you will get an extra \url{} in the output. Undesirable.

    What is the uri issue? What are you outputting for uri currently? It's my judgment that if the field is meant to provide something that should always be entirely wrapped in \url, you should output it naked (without a \url{} wrapping it), because the bib style will provide the \url{} for you.

  22. Support Staff 53 Posted by charles on 07 Jul, 2011 12:23 AM

    charles's Avatar

    Here's an IEEEtran example where I manually put url="\url{http://...}":

    Ouch. OK, case closed, it seems :-)
    Note sure how the 'uri' is handled by some of the bst files, we'll just leave it as is in doubt for now, as URI can sometimes be just a string, it seems.

  23. 54 Posted by jowens on 07 Jul, 2011 12:27 AM

    jowens's Avatar

    If it makes you feel better, there's not a single "uri" field in any bst file shipped with TeX Live 2011. So you should probably rely on "uri" to be metadata associated with the entry but never appear in an output bibliography. If URI can be a string, then using \url{} appropriately seems senslble to me.

  24. 55 Posted by Bas Bennebroek on 07 Jul, 2011 05:28 AM

    Bas Bennebroek's Avatar

    Op 7 jul. 2011 om 00:27 heeft "jowens"<[email blocked]> het volgende geschreven:

  25. 56 Posted by Bas on 20 Jul, 2011 09:54 AM

    Bas's Avatar

    Hi,
    Related to my earlier question on including the URL field in webpage-entries, what about including the 'Last visited'-date as well? My university (and I think a lot more) require that you also include this last visited date in any webpage reference in the bibliography, and as Papers already includes the field, it shouldn't be too hard to also export this field in the "Standard" Bibtex export option?
    Thanks a lot!

    (PS: the comment above was mistakenly posted by me replying to an automated email from this forum; you can delete it.)

  26. Support Staff 57 Posted by charles on 20 Jul, 2011 11:17 PM

    charles's Avatar

    @Bas: the field 'date-added' is exported in the complete export option, and is basically when the web page was added, so that could be a good start. There is also the 'read' field corresponding to last read date. Not sure if that is what you need?

  27. 58 Posted by Gordon Lister on 21 Jul, 2011 01:09 AM

    Gordon Lister's Avatar

    I'm a newbie when it comes to BibTeX so maybe this is done already.
    .... but I have several thousand PDFs and as I trrawl through them on the way to writing a nature paper I want to export to a single BibTeX library that I must include when (and if) the paper is accepted. Right now - painfully - I export each selected item as I find it.

    Sure I could add it to a collection - and then do the export in one go at the end - but to get the DOI I must use "complete" and the "complete" option requires edits that I do as I go along. So it would be much appreciated if I could persuade the addition of an option to "add" to a BibTeX library.

  28. Support Staff 59 Posted by charles on 21 Jul, 2011 03:56 AM

    charles's Avatar

    Sure I could add it to a collection - and then do the export in one go at the end

    @gordon: I am sorry at the moment this is indeed the only option. One thing we'd like to add in the future is the ability to keep a live bib file updated always synced to your Papers library...

  29. 60 Posted by Gordon Lister on 21 Jul, 2011 04:11 AM

    Gordon Lister's Avatar

    Hi Charles

    I understand that things will change as you get time, and certainly my point is made only to attempt to influence what gets done when etc and how.

    It would be help a lot if I could distinguish between smart collections and collections, so that I could close my list of 20 or so smart collections, and conveniently drag (without back and forth scrolling) an item from the search list into a collection.
    OR to have the active collection appear at the top of the list.

    That's an easy and practical thing that would help people with large numbers of collections

    A live .bib file is not a bad idea, but then one always needs to edit (or to select out) what is not needed so if you choose this route being able to save subsets of a .bib file linked to a collection (and to be able to determine which fields - so as to be able to easily link to bibdesk) would be a very big help.

    all the best
    gordon lister

  30. Support Staff 61 Posted by charles on 21 Jul, 2011 05:09 AM

    charles's Avatar

    @Gordon: Many thanks for your feedback! Separate headings for the smart collections and livfe collections is something we have in mind, though I don't know if/when it will be done :-)

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac