Level Up: Perl and Workflows
By Mark Ciotola
First published on February 15, 2020. Last updated on February 15, 2020.
- Students will write a brief PERL program to parse a sample document.
Parsing a block of text means to find one or more characters, and flag that group or change it. This is a very important skill in both literature research and professional programming.
It is easy enough to search for a short group of characters (also called a “string”) in a word processing document. Word processors often make it easy. However, sometimes you will need to do a more complicated search or efficiently go through many documents.
Let’s examine an example. Say you were looking for all references to the name Jean Doe in a large collection of digitized letters and public records. Here is a simple way to do it:
- Open file
- Search for “Jean Doe”, and mark position of each find.
- Close document.
- Repeat until all documents have been searched.
- Export report of all found instances.
Easy enough, kind of. Except that names often get misspelled or translated.
Jean could be spelled as “Gene” or translated as Jeanne or John. So you might have to search for those and similar cases as well. Or there might be spaces or hyphenations in the middle of the name, so you might also have to search for “Je an”. Or what if you are looking for Jean Doe only written as a stylized signature. Then you might have to run an image recognition search. You can only do so much, and the importance of what you need to find and your available resources will dictate your level of effort. However, this is certainly not an exact science!