logo

Can I search the text of a G-Book® like I can a traditional ebook?

To answer this question for a specific G-Book, check its listing in the store and look for the label "Search Text:".

All G-Books start out as images (scans or camera shots) of individual book or magazine pages assembled into large pdf files. So each page is essentially a photograph, not letters and words of text in a computer file, as in a text-based ebook (such as a plain-text pdf or an .epub file).

While text-based ebooks permit text searches, they usually deny you the original typefaces, layout, art, and other enjoyable aspects of the printed version of the book. Another problem with text-based ebooks is that they too usually started out by being scanned from the pages of a printed edition. Unlike a G-Book, text-based ebooks apply optical character recognition (OCR) to each page to convert the graphical image into text  and then they discard the pictorial and graphical content. If you've done any OCR processing (or read ebooks that were "OCR'd") you know that it's far from perfect and requires a careful proofread to correct the nonsensical errors introduced when the OCR software has to guess at a word it can't clearly interpret, or when it guesses and inserts the wrong word. Because the graphical image is gone, all you have is a wrong, nonsensical word, and you can't find the correct word unless you have a copy of the printed book (or its G-Book version!). A text-based ebook is only as accurate as the proofreader who read every page of the scanned files (if there even was a proofreader!). In contrast, a G-Book gives you a high-resolution snapshot of every page, so you're assured of every word being just as it was typeset—including even the original typographical errors that identify and make some editions especially valuable.

Now, G-Books can also be made text-searchable by adding the text as an invisible layer behind each page's graphical image. You see the image as usual, but the pdf software can search the text behind the image and highlight the image in the location of the matching text. G-Books that include this feature are listed in the store as Search Text: Yes (searchable text layer behind page images). As the technology improves we are producing more searchable text books. We will never remove the top image layer that gives the G-Book its "real" book quality, but where file sizes do not grow dramatically we will be adding more searchable books. There is an important caveat (which applies to most text-based ebooks as well): we don't have the resources to proofread the OCR results, so the hidden text behind the images will never be as perfect as the images. You might search for a word and find it in nine out of ten places, but the tenth might not have been accurately OCR'd. Still, 90% can be helpful if you're searching for specific terms, so we think it's worth adding this background text layer when it's feasible.

Will you be losing a lot if your G-Book selection isn't text searchable? We don't think so. Remember, those wonderful original printed books weren't text-searchable either, and they were still very functional. Any scholarly book that had an index (the original "text-searchable" technology) still has the index in the G-Book. You can look in the index, find the relevant page numbers and jump to those pages. An good index provides for a better search experience than simply searching for each occurrence of a term in a book. While text-based ebooks let you search for text, they break the index of the original printed book, because text-based ebooks remove a book's original page breaks and make it into one long stream of text. You can look at the index of an ebook and see that a term occurred on page 317 of the printed book, but where is page 317 in that text-based ebook when it scrolls from top to bottom with no page breaks, or with page breaks that are artificially created by the size of your reader and the type face you've chosen? So, instead of using a printed book's index to refer you to the relevant pages for a term, in text-based ebooks you'll have to search every match of that term rather than benefit from the indexer's arrangement of the term into ranges of pages and subtopics in the original index. Searchable-text G-Books will give you the best of both; you can search for your own terms, and you can look for those terms in the book's index and let the index help you jump to the important pages where that term or concept is discussed. With nonsearchable G-Books you'll be in the same boat as if you held the printed copy in your hands; no better, but no worse!