Start Your Remix Engines: Millions of New Public Domain Images Now on Flickr
Over 2.6 million images from books published between 1500 and 1922 are now on Flickr thanks to a Yahoo! fellow who, in a sense, reverse scanned 600 million pages from the Internet Archive.
Via Open Culture:
Thanks to Kalev Leetaru, a Yahoo! Fellow in Residence at Georgetown University, you can now head over to a new collection at Flickr and search through an archive of 2.6 million public domain images, all extracted from books, magazines and newspapers published over a 500 year period. Eventually this archive will grow to 14.6 million images.
Via The BBC:
Mr Leetaru said digitisation projects had so far focused on words and ignored pictures.
"For all these years all the libraries have been digitising their books, but they have been putting them up as PDFs or text searchable works," he told the BBC.
"They have been focusing on the books as a collection of words. This inverts that.
"Stretching half a millennium, it’s amazing to see the total range of images and how the portrayals of things have changed over time.
Geek Speak: Traditional book scanning uses optical character recognition to extract text from books but, in doing so, more or less ignores images. Leetaru wrote a program that went back through the scans and reversed the process, favoring images over text.
Back to The BBC:
The software also copied the caption for each image and the text from the paragraphs immediately preceding and following it in the book.
Each Jpeg and its associated text was then posted to a new Flickr page, allowing the public to hunt through the vast catalogue using the site’s search tool.
Read through to the BBC to learn more about how it was done.
Image: Partial screenshot, page 26,198 of the collection.