HathiTrust Digital Library

66 points by djoldman 4 days ago

HathiTrust is much better than Google Books about allowing access to works that are no longer under copyright in the United States. Under US law, everything published 1929 and before is currently in the public domain. But there are a lot of special cases where 20th century works published after 1929 are also in the public domain:

https://guides.library.cornell.edu/copyright/publicdomain

Google Books appears to follow the blanket 1929 rule, or did the last time I looked. HathiTrust has cleared the copyright status for many additional works following the more complex rules, e.g.

"Drawing Birds" by Joy Postle, 1953:

https://babel.hathitrust.org/cgi/pt?id=nyp.33433115876140&se...

Unfortunately, the Google-originated scans that HathiTrust has come with special restrictions. Google itself required that only people associated with the academic libraries could download whole books as a unit, even for works that are in the public domain:

https://hathitrust.atlassian.net/servicedesk/customer/portal...

Fortunately, members of the public can download individual page scans without any special affiliation. People have naturally written tools to automate this process so that full books can be reassembled and then uploaded to the Internet Archive or other book sites.

Google Books has a much faster and sometimes better search interface, so a common flow I use is to search Google Books for terms and then go to HathiTrust to read inside books that Google Books surfaced but won't show.

EDIT: corrected 1926 to 1929 per cxr's comment below.

billbrown a day ago

This is very helpful context. I have disparaged HathiTrust in my mind for several of these public domain problems and it makes sense that it's actually a Google Books problem.

roadside_picnic a day ago

Somewhat tangential, but HathiTrust was born from what I would consider the "golden age" of technical work coming out of libraries (2002-2010). One of the unintended consequences of the dotcom crash was that compensation falling meant that there were a lot of talented software people working on what interested them rather than what simply paid the most (since the gap was much smaller).

As a result research libraries were well staffed with very technical people all genuinely interested in making software that made the world a better place. MIT's DSpace, LibraryThing, Open ILSs like Evergreen/Koha, and a huge range of quirky/innovative smaller projects that no longer exist all came out of this period.

It ended around 2010 since the GFC fallout started to hit library budgets while tech suddenly started getting really hot. Even if you loved libraries, most library devs where facing pay cuts to stay in libraries versus massive raises and other quality of life improvements for going into tech. Plus startups and tech companies in general at the time felt more inspired.

sadcodemonkey a day ago

I worked at a university library for a few short years in the 2010s. Reading your comment helped me make sense of some of the experiences I had there. I still try to keep on top of some of the trends, with the vague hope of working in that field again one day.
I'm curious what some of the "quirky/innovative smaller projects that no longer exist" are, if you're inclined to go into some details. Or if you could point to a good resource on this somewhere. A lot of technology projects in the library space seem to reinvent the wheel over and over, so I think such a list is very valuable.
geephroh a day ago

And now that government funding sources like IMLS, CLIR, NEH, NARA and LoC have been nuked and/or crippled, things are unlikely to get better any time soon, especially for collaborative research projects that have no immediate commercial benefit.

robin_reala a day ago

We use Hathi a lot at Standard Ebooks as a source of scans to proof productions against. Archive.org has a somewhat better interface, but Hathi has a wider selection.

cxr a day ago

Try John Mark Ockerbloom's Online Books Page:
<https://onlinebooks.library.upenn.edu/>
For the books that have been manually curated, multiple collections are indexed, including HathiTrust and the Internet Archive. Search will also fall back to showing hits from the "extended shelves" if a title is not in the catalog.
shervinafshar a day ago

Thanks for your volunteer work for Standard Ebooks!

acidburnNSA a day ago

As a nuclear power historian, this resource is unbelievably valuable. I've been using it for years and it constantly delivers the goods. It contains incredible multitudes.

dilawar a day ago

Haathi means elephant in Hindi. I first thought it is to be an Indian site but it is based in the US.

Curious about the connection.

pyuser583 a day ago

There's an English saying, "an elephant never forgets." I'm guessing its about that.
- shervinafshar a day ago
  
  Tangential:
  - https://en.wikipedia.org/wiki/Elephant_Memory_Systems
  - https://i.imgur.com/vNQURE3.jpeg
JdeBP a day ago

You can still find the original answer, from 2008, at https://old.www.hathitrust.org/help_general.html .

leetrout a day ago

My family is from Eastern KY and I had access to the HTDL and NYPL through my stint working for a public university a few years ago. It's fascinating what you can find in there! When I had looked a couple years ago it seemed like there wasn't as much publicly available as what I am seeing now.

apaprocki a day ago

I would use this site all the time for genealogy purposes. It’s hard to unravel how the datasets are shared, because many things here are from Google’s scanning, but IMO there are lots of things that do not appear anywhere else.

TZubiri a day ago

One day I needed some legal info, I call the library of congress, they send me a link to hathitrust with a hearing from 1980. Sent to my email, boom I take that link add it to wikipedia.

All free (tax dollars ok) and swift, felt surreal.

pyuser583 a day ago

This is an excellent resource! It should be more popular!

JdeBP a day ago

It is. It's used on a fairly regular basis nowadays in Wikipedia, for example. A decade ago one would have seen just the Internet Archive or the dreaded Google Books hyperlinks.