Tech Roundtable: Jon Fackrell, Wilford Woodruff Papers
Speakers: Bethany Caye Radcliff, Jon Fackrell, Ben and Sara Brumfield
Bethany Caye Radcliff: Jon is a librarian and software engineer who loves solving problems and the project that Jon is presenting today is the Wilford Woodruff Papers Foundation and how they are using From The Page to transcribe thousands of documents written by Wilford Woodruff. These documents include journals, autobiographies, letters, and other discourses.
After the documents are transcribed on From The Page, they're published on a custom web platform where the scanned documents can be searched and viewed side by side with the transcript. And I will turn it over to Jon to tell you more about this.
Jon Fackrell: I have a quick introduction that I wanted to share, just about the Wilford Woodruff Papers and just a quick introduction to Wilford Woodruff. He lived from 1807 to 1898 and was an American religious leader who served as the fourth president of The Church of Jesus Christ of Latter-day Saints, from 1889 until his death.
So during his life, he wrote over 11,000 pages in journals and day books, and he wrote over 13,000 letters and received another 17,000 letters, and the foundation has gathered, I think the number is just over 6,000 of those letters and they're trying to find more of them. And then as Bethany said, it also includes discourses, autobiographies, and other records that he wrote. And we have quite a few of those. It's kind of an interesting story about a family member who found them in their basement and we actually provided them to the church and the church scanned them, but the foundation was organized in March of 2020, and we work with the church to, we worked with them to get the scans of all the documents, but the problem is transcribing them and making the text searchable and available to more people. So that's what the mission of the foundation is today and they have about a 10 year projection or timeframe of when they'd like to have all of these documents transcribed and ready.
So that's where from the page comes in. We are crowdsourcing the transcription, we have over 120 volunteers that work on transcribing documents and so in order to get all of this content on a platform where it can be searched and browsed by people and even looked at by researchers. We developed a custom platform for displaying the images alongside the transcripts.
And I'm just gonna open this up so I can share some of the features. The custom platform is built on Laravel, is the back end, and it's written in PHP. We host it on reclaim hosting, but the front end is a combination of, we're using tailwind CSS for the custom design here, but the viewer itself is a Open Sea Dragon viewer that's been customized. So you can, you know, zoom in, you can rotate things, you can download the original file, but one of the key features that we're using From The Page for is tagging people and places and dates that are mentioned in the documents, and so as we export the items from, From The Page and import them into the website, we're parsing those tags that our transcribers are adding and we've created things like biographies. So, this letter is to Phebe, who is Wilford's wife, and so when we click on the, the friend right there, it's actually tagged with her full name and we can see a bio of who she is and if we click on the view more, we actually can go into a page that has the bio, footnotes or references, and then all of the pages where she is mentioned. So whether it's in letters or journals or anything, we can go in and, and see each of those documents. And so you can see that in the transcription itself and then we collect those and put them just below the transcription so people can easily see those tagged people and places as well.
I talked about the technology that we're using and I mentioned we do harvest the documents from the triple I F API. We have document sets that are published quarterly. So, you know, there's a lot of planning that goes into what's going to be transcribed and getting help from the volunteers and we push those out to the website and once they're flagged as ready to go, we do a background process weekly, and we keep that going, so anything that's published on the website, we'll just run this background script to update the documents on our website so that if any changes are made, they're imported and we see those updates regularly. You can also trigger those manually. If we know there's an update that needs to be made, but just wanted to talk a little bit about the structure that we've created on the site, and that is this is a letter. So for a letter, we'll have this container that contains all of the pages, and then we link the images to that. Pretty straightforward, but that goes into how we wrote the code. So this is the code that would be used to import the containers. So letter or journal, we go through and grab all of the manifests from the API update or create them, and then just log that we've done that and then the next screen I wanted to show, this is what we've done to actually go through and create or update the page, grab the transcript, and then attach the image.
So there's not a lot of code here to just go through and process all of those. And then you can see, we have a, a method here that will actually convert all of those tags and convert them into a format that our website will understand so that it can link all of those up and, and go to the URLs on our website. So I think that's everything.
Ben and Sara Brumfield: Super impressive. Wow. Yeah. Yeah. These are all.
Jon: Oh, thank you.
Sara: Yeah. Do we have, Jon is your code, is your scripts open source or not Jon?
Jon: I think they are open source. That's one thing that we'd like to do in the future is now that we've got it up and running and working, we'd like to make it more available and maybe pull out some of the scripts so that they could be used outside of the website, but right now it's like deep within the Laravel website that you would find this script.