Understanding Document Assembly in Ephesoft Transact

by Luis Colorado

Luis Colorado, Zia Consulting

Luis Colorado – Software Engineer at Zia

Ephesoft Transact is designed to streamline administration and operations to make things flow efficiently. Part of the feature set of Ephesoft Transact is classification of the documents.  This works by looking at each page that is being processed and not only classifying the document type but also determining if it is a first, middle, or last page of that document type. However, some topics may be a bit obscure or not very well explained in the documentation, and one of those topics is the classification of first, middle, and last pages in Transact.

Transact makes many smart assumptions when it is learning how to classify new documents, but it can’t read your mind (will we one day get that far with AI?)

When you are training Transact to classify a new, multi-page format, it gets it straight most of the time. If you feed one page, it classifies it as the first page (brilliant!)

If you feed two pages, it classifies the first page as a first_page, and the second page as a last_page (that’s smart!)

Note about naming conventions: in this document we refer to the location of a page as “first page” or “last page,” and we use the underscore (_) when we are talking about an Ephesoft page type, like “first_page.” Soon you will learn that the actual location and the Ephesoft page type are not necessarily the same thing. Another thing: we use lower case for the page types, but Ephesoft frequently uses proper case, like “First_Page.”

Conveniently enough, if you train it with a three-page form, it will distinguish (you guessed it) the first, middle, and last page:

However, when a form has more than three pages, things start to get a bit surprising:

 

That’s right! Transact classifies the pages between the first and the last as…(drum roll) middle pages! As you can see above, that means that more than one page can be classified as a middle_page. In fact, first, middle, and last pages can have any number of pages.

However, this is when Transact doesn’t look that smart anymore. What if we have two alternative ways to end a document? There is no way in the User Interface to tell Ephesoft that some pages should be classified as First-Page or Last-Page.

How do we make Transact to classify that middle page as a Last_Page?

Why is Page Classification Important?

It is important to get page classification right to make sure that documents are assembled, separated, and scored correctly. For example, if Transacts finds and classifies a middle_page incorrectly as a first_page, it will truncate the document and will think that it has found a new document. The Document Assembler Plugin documentation provides the technical details.

Documents that don’t have unique content on each page may be a problem. For instance, if you think of a phone bill, a credit card statement, or a spreadsheet printout, it has the header and footer information that is the same on each page. In those cases there are some advanced techniques that can be used. Contact Zia for more information.

Documents Without a Last Page

It may sound strange, but in some cases we may want to define a document that does not have a last page. For example, let’s suppose that we have a type of invoice that is usually only one page long, but sometimes it may be longer. In this example, we have a form that has a first and a middle page, but no last page.

Unfortunately, if we train Transact with an invoice format with two pages (first and optional middle), Transact will learn it as First_Page and Last_Page, instead. That means that the invoices with more than one page would get truncated when scanned by Transact.

For example, consider the following invoice that has three pages:

  • Invoice 1, page 1
  • Invoice 1, page 2
  • Invoice 1, page 3

Our invoice would be incorrectly separated by Transact as the following:

  • Document 1:
    • Invoice 1, page 1 ← This is recognized as a first_page
    • Invoice 1, page 2 ← This is recognized as a last_page, so Transact thinks that this is the end of the invoice.
  • Document 2:
    • Invoice 1, page 3 ← This is recognized as a last_page

We need to train Transact so the first page of our invoice is classified as a first_page, and the second page as a middle_page.

Let’s learn how to reclassify the first, middle, and last page.

Alternate Endings: Reclassifying First_, Middle_, or Last_Page

Let’s suppose that you have a three-page contract, but the last page has two different formats. For example, in some states the last page should include some extra text, or maybe we want the branch manager’s signature included in some contracts.

Our example three-page contract is delivered to Ephesoft as a four-page document which we want to be classified as shown below. Note that we have two versions of the last page: the standard page and the page with the branch manager’s signature:

  • Page 1 of the contract (borrower’s information and conditions): first_page
  • Page 2 of the contract (more conditions): middle_page
  • Page 3 of the contract, version 1 (some additional data and signatures): last_page
  • Page 3 of the contract, version 2 (this version of the last page includes additional language and the branch manager signature): last_page

However, when we drop our sample into the learning section, we notice that Transact classified the pages as shown below:

  • Page 1 of the contract (borrower’s information and conditions): first_page
  • Page 2 of the contract (more conditions): middle_age
  • Page 3 of the contract, version 1 (some additional data and signatures): middle_page
  • Page 3 of the contract, version 2  (this version of the last page includes additional language and the branch manager signature): last_page

How to convince Transact that there are two (or more) alternate end pages in our form? The trick is using the folder management screen and then move the files manually to their corresponding section.

In our case, we’ll move the file 03 from the middle_page section to the last_page, and then we will use the “Learn Files” button to retrain Transact.

The following video shows how to do it:

 

Step-by-Step Instructions

Just in case you missed any of the steps from the video, or you come back to this article to revisit them, this is what we did to reclassify as a “Last_Page” the Page 3, version 1, of our sample contract:

  1. Go to the folder management screen, which is one of the options available to the administrator.
  2. In SharedFolders find your batch class (for example BCA) and then open the folder lucene-search-classification-sample.
  3. Open the folder for the document type (for example, LoanApp1). You will notice three folders named with the document type plus a suffix like _first_page, _middle_page, or _last_page.
  4. Open the folder for the middle_page samples.
  5. Notice that most of the files contain a sequence number according to order of the pages. For example, you will see two files with the number 0003 embedded in the name. If you can’t see the full name of the files (and the number 0003), maximize your browser to full screen, or drag the right side of the column Name to the right until you can see the full name.
  6. Mark the checkbox for the two files with the number 0003, and then click on the Cut command at the top.
  7. Open the folder …_last_page, and click on the paste command at the top. This moved the page 3, version 1, to the last_page classification folder—but we are not done yet.
  8. Go back to the batch class management screen and open your batch so you can see your forms.
  9. Select the checkbox next to the document type and click on the learn file(s) command at the top.
  10. Optional step: feed two versions of the document and confirm that it was classified correctly.

Conclusion

In this article we learned how to ensure that documents are classified and separated correctly even when the form has two or more different versions of the last page. We learned that Transact automatically classifies the first and last page as first_page and last_page, and everything in between is identified as middle_page. We also learned that such behavior would not be correct when we have two or more different versions of the first or the last page of a form, but we learned that we can rearrange the pages using the folder management screen.

As you may have already guessed, the same steps described here apply not only to the last pages, but can be used also when there is more than one version of the first page of a document.

For additional information, contact us today.

Pin It on Pinterest

Sharing is caring

Share this post with your friends!