Oracle

11 mn

Hyperscalers PaaS platform versus Oracle PaaS platform, what to choose?

Le 06 March 2022

Recent years have seen the emergence of generalist PaaS platforms such as AWS, Azure, Google and more recently the PaaS platforms of major software publishers such as Oracle.

In addition to this, there is an abundant supply of open source bricks.

This offers companies a very wide choice to meet their specific application needs, but it also complicates their choices and we find that, under pressure from suppliers who are trying to promote their solutions, CIOs are finding it difficult to see clearly.

Through a particular service offered by these platforms: character recognition (OCR), the objective of this article is to highlight the main characteristics of 3 approaches and to show that the choice between them is ultimately a « make or buy » alternative.

It is common in cross-functional business processes for the original media to be in paper, image or PDF format (e.g. an order received from a customer, a supplier’s invoice or an acceptance report). However, the company’s management applications deal with structured data and without structured data to feed them, this type of tool would be obsolete. To process text data, it must first be extracted from its original medium before being transferred from one interface to another in the management IS. To solve this problem, there are text recognition tools, known as OCR.

These tools extract the required data in a format that can be used by business applications. To do this, they proceed in two steps :

Recognition and extraction of characters from the source document
Semantic analysis of the generated file to identify relevant fields

So which solution should you choose ?

We will determine here the characteristics of these different solutions, to enable you to determine which one would be best suited to your use.

Among the different OCR solutions, we can distinguish between open source solutions (to be hosted on a local server), OCR services offered by generalist PaaS platforms (Azur, AWS, Google, etc.), and OCR services offered by the PaaS platforms of software package publishers or specialist business solution publishers such as Oracle.

Generalist Paas (OCR cloud raw data extraction)

This solution generates a .Json file, which is more accurate than the text file, making it a much more accurate tool than local OCR, with good performance even with blurry or badly scanned documents. It is a remote tool, requiring a connection to the cloud. The scanned document is sent to the cloud OCR, which responds with a .Json file. The advantage of this type of file is that it is more accurate than a text file, as it contains not only characters but also information about the position of the text on the page in the form of bounding boxes, which makes it possible to manipulate the coordinates. Some versions provide another useful piece of information: a character detection confidence index, which allows the probability of error to be judged. Information about the layout of elements on the page is hierarchically arranged at different levels of precision: paragraph, line, word, and character for example. There are several APIs to implement this solution, such as G vision and MS Azure. They both offer a good level of performance, with data extraction in less than 5 seconds. Other APIs such as Recognition by Amazon are less efficient.Plateformes Paas des éditeurs de progiciels. (OCR cloud clé en main)

Paas platforms of software publishers. (OCR cloud turn key)

The turn key cloud solution directly returns the specific fields requested. Some APIs also offer artificial intelligence type learning with neural networks, which improves performance. This is the case with Nanonets and Verify. Other APIs do not require any learning, and can directly extract the information from the fields with the same reliability as the generalist Paas platforms.

This solution is very efficient, with results similar to those of classic Paas platforms, but it does have some drawbacks. Indeed, although it does not require any integration effort, the lack of access to the code has a disadvantage: in the event of a bug, it cannot be solved directly by the IT team, but only the editor is able to intervene.

Generalist Paas platforms (cloud OCR for raw data extraction)

For this solution, there is a non-negligible fixed integration cost in order to develop the in-house script for retrieving the desired data, and a variable cost for invoicing the cloud service, depending on the number of documents processed (relatively accessible: around $1.50 per 1000 documents scanned).

Thus, when the volume of documents is significant, it is less expensive than the solution of the Paas platforms of software publishers, for an equivalent performance.

Paas platforms of software publishers. (“Turn-key” cloud OCR)

For this solution, in the absence of any integration effort, the only cost is that of invoicing, depending on the number of documents processed.

Thus, for a low volume, the Pass Platform from software publishers is both the least expensive and the most efficient solution. On the other hand, for a high volume, it is the most expensive of all.

It should also be noted that if the expected use falls outside the field for which the platform was designed, the added value provided by the semantic analysis proposed by the publisher becomes irrelevant and the platform loses its interest compared to generalist Paas platforms. For example, such a platform can save a company a lot of time when recognising supplier invoices if it has been designed for this purpose, but will lose its interest if the company wants to use it to recognise cash register receipts if the platform has not been designed for this purpose.

In summary, the choice between an open source solution, a general purpose PaaS platform service or a business software vendor PaaS platform service can be summarised as a « make or buy » decision.

Open source solutions have a low entry cost but require a major integration effort on the part of the company, with maintenance and operation of the solution being completely at the company’s expense. In our opinion, this approach can be justified for very specific needs or large volumes that allow the investment costs to be amortised.

The PaaS services of software publishers such as Oracle, on the other hand, have a much higher cost per use but are turnkey solutions that require little integration effort on the part of the company as well as little maintenance and operating effort because these are taken care of by the service provider. These services are relevant when one or more of the following criteria are met: need to implement the solution quickly, moderate volumes, fairly standard use cases covered by the platform.

The PaaS services of generalist publishers fall between the two; they allow the company to be relieved of the raw recognition component for a fairly moderate cost per use. On the other hand, the company will have to invest in an integration effort to deal with the semantic analysis dimension (via developments or possibly by using additional services offered by the SaaS platforms). This approach is often a good compromise because it allows the company to benefit from the scale effect and robustness of the generalist PaaS services and to concentrate on its specific component, which is generally semantic analysis.

Character recognition and extraction from the source document

Open Source OCR :

Generalist Paas (OCR cloud raw data extraction)

Semantic analysis of the generated file to identify relevant fields

Open Source OCR :

Generalist Paas platforms (OCR cloud extraction of raw data) :

Paas platforms of software publishers. (OCR cloud turn key)

Cost structure

Generalist Paas platforms (cloud OCR for raw data extraction)

Paas platforms of software publishers. (“Turn-key” cloud OCR)

What to choose?

Articles connexes

SAP S/4HANA, Accelerate your Digital Transformation!

Business At Work x Exertum, a winning combo for the SAP users

IS strategy in the age of cloud computing

“Cloudisation” of the Information System

Change Management in a Collaborative and Digital Context

Hyperscalers PaaS platform versus Oracle PaaS platform, what to choose?

Character recognition and extraction from the source document

Open Source OCR :

Generalist Paas (OCR cloud raw data extraction)

Semantic analysis of the generated file to identify relevant fields

Open Source OCR :

Generalist Paas platforms (OCR cloud extraction of raw data) :

Paas platforms of software publishers. (OCR cloud turn key)

Cost structure

Generalist Paas platforms (cloud OCR for raw data extraction)

Paas platforms of software publishers. (“Turn-key” cloud OCR)

What to choose?

Articles connexes

SAP S/4HANA, Accelerate your Digital Transformation!

Business At Work x Exertum, a winning combo for the SAP users

IS strategy in the age of cloud computing

“Cloudisation” of the Information System

Change Management in a Collaborative and Digital Context

Inscrivez-vous à notre newsletter