HathiTrust is much better than Google Books about allowing access to works that are no longer under copyright in the United States. Under US law, everything published 1929 and before is currently in the public domain. But there are a lot of special cases where 20th century works published after 1929 are also in the public domain:
Google Books appears to follow the blanket 1929 rule, or did the last time I looked. HathiTrust has cleared the copyright status for many additional works following the more complex rules, e.g.
Unfortunately, the Google-originated scans that HathiTrust has come with special restrictions. Google itself required that only people associated with the academic libraries could download whole books as a unit, even for works that are in the public domain:
Fortunately, members of the public can download individual page scans without any special affiliation. People have naturally written tools to automate this process so that full books can be reassembled and then uploaded to the Internet Archive or other book sites.
Google Books has a much faster and sometimes better search interface, so a common flow I use is to search Google Books for terms and then go to HathiTrust to read inside books that Google Books surfaced but won't show.
EDIT: corrected 1926 to 1929 per cxr's comment below.
This is very helpful context. I have disparaged HathiTrust in my mind for several of these public domain problems and it makes sense that it's actually a Google Books problem.
Somewhat tangential, but HathiTrust was born from what I would consider the "golden age" of technical work coming out of libraries (2002-2010). One of the unintended consequences of the dotcom crash was that compensation falling meant that there were a lot of talented software people working on what interested them rather than what simply paid the most (since the gap was much smaller).
As a result research libraries were well staffed with very technical people all genuinely interested in making software that made the world a better place. MIT's DSpace, LibraryThing, Open ILSs like Evergreen/Koha, and a huge range of quirky/innovative smaller projects that no longer exist all came out of this period.
It ended around 2010 since the GFC fallout started to hit library budgets while tech suddenly started getting really hot. Even if you loved libraries, most library devs where facing pay cuts to stay in libraries versus massive raises and other quality of life improvements for going into tech. Plus startups and tech companies in general at the time felt more inspired.
I worked at a university library for a few short years in the 2010s. Reading your comment helped me make sense of some of the experiences I had there. I still try to keep on top of some of the trends, with the vague hope of working in that field again one day.
I'm curious what some of the "quirky/innovative smaller projects that no longer exist" are, if you're inclined to go into some details. Or if you could point to a good resource on this somewhere. A lot of technology projects in the library space seem to reinvent the wheel over and over, so I think such a list is very valuable.
And now that government funding sources like IMLS, CLIR, NEH, NARA and LoC have been nuked and/or crippled, things are unlikely to get better any time soon, especially for collaborative research projects that have no immediate commercial benefit.
We use Hathi a lot at Standard Ebooks as a source of scans to proof productions against. Archive.org has a somewhat better interface, but Hathi has a wider selection.
For the books that have been manually curated, multiple collections are indexed, including HathiTrust and the Internet Archive. Search will also fall back to showing hits from the "extended shelves" if a title is not in the catalog.
As a nuclear power historian, this resource is unbelievably valuable. I've been using it for years and it constantly delivers the goods. It contains incredible multitudes.
My family is from Eastern KY and I had access to the HTDL and NYPL through my stint working for a public university a few years ago. It's fascinating what you can find in there! When I had looked a couple years ago it seemed like there wasn't as much publicly available as what I am seeing now.
I would use this site all the time for genealogy purposes. It’s hard to unravel how the datasets are shared, because many things here are from Google’s scanning, but IMO there are lots of things that do not appear anywhere else.
One day I needed some legal info, I call the library of congress, they send me a link to hathitrust with a hearing from 1980. Sent to my email, boom I take that link add it to wikipedia.
All free (tax dollars ok) and swift, felt surreal.
It is. It's used on a fairly regular basis nowadays in Wikipedia, for example. A decade ago one would have seen just the Internet Archive or the dreaded Google Books hyperlinks.
HathiTrust is much better than Google Books about allowing access to works that are no longer under copyright in the United States. Under US law, everything published 1929 and before is currently in the public domain. But there are a lot of special cases where 20th century works published after 1929 are also in the public domain:
https://guides.library.cornell.edu/copyright/publicdomain
Google Books appears to follow the blanket 1929 rule, or did the last time I looked. HathiTrust has cleared the copyright status for many additional works following the more complex rules, e.g.
"Drawing Birds" by Joy Postle, 1953:
https://babel.hathitrust.org/cgi/pt?id=nyp.33433115876140&se...
Unfortunately, the Google-originated scans that HathiTrust has come with special restrictions. Google itself required that only people associated with the academic libraries could download whole books as a unit, even for works that are in the public domain:
https://hathitrust.atlassian.net/servicedesk/customer/portal...
Fortunately, members of the public can download individual page scans without any special affiliation. People have naturally written tools to automate this process so that full books can be reassembled and then uploaded to the Internet Archive or other book sites.
Google Books has a much faster and sometimes better search interface, so a common flow I use is to search Google Books for terms and then go to HathiTrust to read inside books that Google Books surfaced but won't show.
EDIT: corrected 1926 to 1929 per cxr's comment below.
This is very helpful context. I have disparaged HathiTrust in my mind for several of these public domain problems and it makes sense that it's actually a Google Books problem.
Somewhat tangential, but HathiTrust was born from what I would consider the "golden age" of technical work coming out of libraries (2002-2010). One of the unintended consequences of the dotcom crash was that compensation falling meant that there were a lot of talented software people working on what interested them rather than what simply paid the most (since the gap was much smaller).
As a result research libraries were well staffed with very technical people all genuinely interested in making software that made the world a better place. MIT's DSpace, LibraryThing, Open ILSs like Evergreen/Koha, and a huge range of quirky/innovative smaller projects that no longer exist all came out of this period.
It ended around 2010 since the GFC fallout started to hit library budgets while tech suddenly started getting really hot. Even if you loved libraries, most library devs where facing pay cuts to stay in libraries versus massive raises and other quality of life improvements for going into tech. Plus startups and tech companies in general at the time felt more inspired.
I worked at a university library for a few short years in the 2010s. Reading your comment helped me make sense of some of the experiences I had there. I still try to keep on top of some of the trends, with the vague hope of working in that field again one day.
I'm curious what some of the "quirky/innovative smaller projects that no longer exist" are, if you're inclined to go into some details. Or if you could point to a good resource on this somewhere. A lot of technology projects in the library space seem to reinvent the wheel over and over, so I think such a list is very valuable.
And now that government funding sources like IMLS, CLIR, NEH, NARA and LoC have been nuked and/or crippled, things are unlikely to get better any time soon, especially for collaborative research projects that have no immediate commercial benefit.
We use Hathi a lot at Standard Ebooks as a source of scans to proof productions against. Archive.org has a somewhat better interface, but Hathi has a wider selection.
Try John Mark Ockerbloom's Online Books Page:
<https://onlinebooks.library.upenn.edu/>
For the books that have been manually curated, multiple collections are indexed, including HathiTrust and the Internet Archive. Search will also fall back to showing hits from the "extended shelves" if a title is not in the catalog.
Thanks for your volunteer work for Standard Ebooks!
As a nuclear power historian, this resource is unbelievably valuable. I've been using it for years and it constantly delivers the goods. It contains incredible multitudes.
Haathi means elephant in Hindi. I first thought it is to be an Indian site but it is based in the US.
Curious about the connection.
There's an English saying, "an elephant never forgets." I'm guessing its about that.
Tangential:
- https://en.wikipedia.org/wiki/Elephant_Memory_Systems
- https://i.imgur.com/vNQURE3.jpeg
You can still find the original answer, from 2008, at https://old.www.hathitrust.org/help_general.html .
My family is from Eastern KY and I had access to the HTDL and NYPL through my stint working for a public university a few years ago. It's fascinating what you can find in there! When I had looked a couple years ago it seemed like there wasn't as much publicly available as what I am seeing now.
I would use this site all the time for genealogy purposes. It’s hard to unravel how the datasets are shared, because many things here are from Google’s scanning, but IMO there are lots of things that do not appear anywhere else.
One day I needed some legal info, I call the library of congress, they send me a link to hathitrust with a hearing from 1980. Sent to my email, boom I take that link add it to wikipedia.
All free (tax dollars ok) and swift, felt surreal.
This is an excellent resource! It should be more popular!
It is. It's used on a fairly regular basis nowadays in Wikipedia, for example. A decade ago one would have seen just the Internet Archive or the dreaded Google Books hyperlinks.