Week of January 22 – Warming Up
January 23:
Finished copibooking the nearly 700-page “Canal Zone Reports Volume 3.” As I was copying, I started to wonder: Would the scans of these be discoverable similar to the periodicals I search when completing school assignments? That would be pretty awesome. Surely the UF library has some gnarly OCR software so that a law student could find the individual canal judgments within the book easily. OK, so I’m digging on this now. I don’t currently search for older books for my research, but I’ll bet there are loads of people who do. Even if these are OCRed, maybe they’re still not made quite public though – that’s how libraries give grants or special visitations to special library collections. hmm. Lots of pondering as I changed to the computer with the Photoshop license to do color corrections.
January 24:
Finished color corrections of all three items. Trained on the LIMB software today. LIMB is where I can view all the pages like Adobe Batch, but can deskew, crop, move, put in final form for patron viewing. I did the two smaller books today. The mammoth one tomorrow.
January 25:
LIMB of the big one. A colleague came to a computer next to mine and was amicable to questions.
What kind of OCR software is used?
Adobe Acrobat Pro.
Really? I’ve OCRed with it before; it doesn’t seem to grab all the letters and make a good text version of it. Perhaps it has gotten better since I OCRed with it years ago?
No, that sounds about right.
She takes the TIFF images of the books online and puts them in .pdf format on request. That was disappointing; no keyword searching within the text then. I think I overheard that there was better OCR software coming though. I’ll see about asking about that soon.
Hence the difference between born digital and now. Drat. I’m still hopeful for some fabulous OCR software coming soon.