7/6/2023 0 Comments Pdfpen extract pagesIn PDFpenPro, you can create headers for your PDF that contain the page number. You have to make sure that you have the page number. I happen to use PDFpenPro, but I image that PDF Expert would have a similar feature. So I would approach this as a problem of extracting the text with the page numbers from the PDF. For example, “The Trojan War started on April 2 in Washington.” Without context, how are you to know whether Trojan, War, April, Washington are people’s names or not? Assume also (as it would be in a book or newspaper) that context is important in determining if a word is actually a name. Most of these regex matches will not actually be names. But assuming the basic worst case scenario that these documents have the general character of a book or newspaper, false positives will actually dominate. You have not really told us how the documents that you work with are actually structured. We will assume that names are between 2 and 20 characters in length.īut, as you allude to, the pattern is not specific to actual people’s names: NATO, January, April, and words at the start of sentences will also match. This will find all names like Robert, Paul, McKeskey, VanRecklinghouse. The regex expression for this is something like You will be looking for all the capitalized words in the document. I would deeply appreciate any insights you might have. For the past couple of days I’ve thought hard about somehow using Keyboard Maestro, PDF Expert, Skim, Hazel, PDFpen, Adobe, Automator, AppleScript, and/or any other combination of tools to make this happen, but I’m at a loss. Is this even possible? I would be more than happy to pay for software that would make this doable, because it would save me untold hours. This is wonderful, but I would very much like to automate this process as much as possible so that I can focus on my other responsibilities vis-à-vis the document in question. PDF Expert then allows me to export a report of the words I’ve underlined, along with the page number. My current (sad) workflow is as follows: Using PDF Expert on my iPad Pro, as I read the file for other purposes, I underline the proper names I come across. PDF Expert can produce a delimited report from highlights and/or underlines, which is the extremely helpful option I already use for other aspects of my work. (2) search the PDF (presumably using regular expressions), locate capitalized words, and underline or highlight or in some other way differentiate them. (1) search the PDF (presumably using regular expressions), locate capitalized words (which of course can’t be differentiated from proper names), and export them to a delimited file with the page number they appear on My limited brain bandwidth has conceived of the following options, but I’m sure there are more: What I would like is a way to automate this. I regularly have to read PDFs (always OCRed) and find proper names and the page numbers they appear on, then export, copy and paste, or (the horror!) type the names and the page numbers into a delimited file (text, Excel, etc.)-for example, as follows: I’m putting this request in macOS, but if it’s possible in iOS, I’m cool with that, too. I’ve searched around for avenues toward this, but I can’t find anything.
0 Comments
Leave a Reply. |