We continuously feel we live in times of fake news or alternative truths. Cases when intellectual property is stolen or misused are also frequent feeds we read from different sources in our everyday life. More precisely, fake documents implying fake certifications, degrees, transactions, contracts, or other documents. In main cases, sensitive information can be comprised of unstructured data, which is not automatically identified and protected.

Structured data stored in databases can be secured relatively easily. Access can be restricted according to strict guidelines. But unstructured data is spread throughout an organization – it exists anywhere users are accessing or creating content. This makes it harder to: Identify who has access to unstructured data and is using it; Track the flow of unstructured data through an audit trail; Communicate how to manage and protect unstructured data.

The problem is:

Unstructured data cannot be processed and analyzed using conventional tools and methods. Examples of unstructured data include text, video, audio, mobile activity, social media activity, generic imagery – the list goes on and on.

Up to now:

Information systems worldwide have tried to protect their documents and make them trustworthy by applying cryptographic mechanisms, such as digital signatures. This mechanism guarantees integrity (the document is not tampered with), authenticity (the owner can be easily verified), and non-repudiation (the owner can not deny he is the owner). These are all necessary properties of a system that stores, manages and protects documents, but if we consider real-life business processes, we should think of very long-term functionality, thing that digital signatures cannot offer without a high degree of technical and procedural complexity, with the additional disadvantage of heavily relying on central authorities.

One possible solution to all these issues, is the introduction of blockchain in Document Management Systems (DMS). The blockchain can be considered as a distributed ledger, or a database, containing a list of continuous records, called blocks, connected as a tree structure and secured by cryptographic algorithms (hash functions). The underlying mechanisms of the blockchain strongly rely on cryptographic apparatus, and mathematical mechanisms.

The main goal of this project is to provide the architecture of a developing system for document management having as a back-up a blockchain-based solution for document content verification. One of the main features of the system are the inclusion of a verification tool, and a statistics module.
The idea is presented in the following main steps:
(1) unstructured information from documents in .pdf formats is extracted;
(2) information is converted to a structured form resulting in a compact table including necessary fields from the documents;
(3) the table data is encrypted and stored in the blockchain.

