Mississippi State University

IndustryTeam Size: 2
Researcher at MSU


“Placeholder – Project in Next Phase…

Read More

We worked with MSU to scrape MDAH images and digitize the text in those images, then link every student in the records so that their school journey could be traced.

Teck Stack

Python via CLI
Amazon Textract

MDAH Challenges

We had to scrape 211,000 images recorded between 1885 and 1957.

The challenge of scraping this many images was making sure the IP's didn't get blocked by the website, so we had to create custom system.

We had to digitize the text within 211,000 images with some handwritten, and as old as 1885

Digitizing handwriting is hard enough but doing so with handwriting from 1800's and early 1900's is even harder because handwriting style has changed quite a lot since then.

Linking the relatives and students effectively over the years

We had to create algorithms to generate a probability score of the sibling or relatives of people within the counties.

close chatgpt icon

Enter your request.