$20 Bonus + 25% OFF CLAIM OFFER

Place Your Order With Us Today And Go Stress-Free

Convert paper ledger to digital form using OCR in python
  • 5

  • Course Code: B9RS106
  • University: Dublin Business School
  • Country: Ireland

Aims, Objectives & Rationale

1.    Outline your research aims, objectives and research questions.


•    Is it possible to convert paper bill / or paper ledger in to digital form.  


•    Computer Vision using OpenCV

•    Coding with Google Colab

•    Text Detection using YOLO

•    OCR using Tesseract    

Research Questions

•    Can we convert the data from paper to digital form.
•    Will the deep learning helps to reduce manual labour converting physical bills to digital.
•    Is it possible to automate the converting process.

2.    What is the academic/scientific rationale for proposing to conduct this research study? Provide references to relevant empirical, conceptual, and theoretical literature

•    People have been relying on paper invoices for a very long time, but these days, all have become digital, and so have invoices. Reconciling digital invoices is a laborious job as it requires employees to spend hours browsing through several invoices and noting things down in a ledger.
•    But, what if we told you we could automate this, and you can save on those human hours spent as a business? Yes, it is possible because of the incredible data science tools like YOLO and Tesseract that one can use to create OCR in Python. OCR stands for optical character recognition, and in this project, we will explain how to build OCR from scratch in Python.

•    Digitized images are often represented as a two-dimensional (2D) array of pixels values. Each pixel value which makes up the color scheme of the image is often influenced by an array of factors such as light intensity. Visual scene is projected unto a surface, where receptors (natural or artificial) produce values that depend on the intensity of incident light.

•    These exciting concepts are however hard to implement. Forming an image leads to loss of details of information while collapsing a three-dimensional (3D) image into a two-dimensional image. Many other factors are responsible for why image recognition/ image processing is hard. Some of such factors are noise in the image (pixels values that are off from its surrounding pixels), mapping from scene to image etc.

•    In recent years, during the ImageNet Large Scale Visual Recognition Competition (ILSVRC, 2015), computers were going better than humans in the image classification task . 9 In 2016, a faster object detector, YOLO, was proposed to implement object detection in real-time situation. Our motivation is to apply YOLO to object detection task of URL links within an image scene. We will also be comparing the speed and accuracy of this with an OCR software.

Also Read - Programming Assignment Help

Building OCR from Scratch in Python


This machine learning project deals with training the YOLO object detection model using the dataset of digital invoices. The model is trained to identify three essential classes from the invoices, Invoice number, Billing Date, and Total amount. After that, you will use Tesseract for performing OCR in python.

Tech Stack
Language: Python
Object detection: YOLO V4
Text Recognition: Tesseract OCR Environment: Google Colab


Any business currently going through all the bills manually to jot it down in a ledger can use this project.

Topics Covered in this YOLO-OCR Project

Below we have mentioned in detail all the data science tools and techniques that you will use to implement the solution of this project.

Computer Vision using OpenCV

OpenCV is one of Python’s most popular computer vision and image processing libraries. Before serving any image to the object detection model YOLO, it must be processed, and for that purpose, you will use OpenCV. Additionally, for visualizing the testing results of the YOLO model, one relies on various functions of the OpenCV library.

Coding with Google Colab

Google Colab is an application hosted by Google in the cloud that allows its users to build executable programs in Python. In this YOLO character recognition project, you will learn to use Colab notebooks to implement the complete solution.

You will learn how to link the darknet OCR framework for training the YOLO v4 model, execute terminal commands in colab notebooks, and do many more exciting tasks.

Colab uses the power of Graphical Processing Units (GPU) for performing this task at a much faster speed than CPU tasks. You will also learn how to change the runtime in Colab and set it to GPU for faster execution

Also Read - C-Programming Assignment Help

Text Detection using YOLO

YOLO v4 is an object detection model developed by Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao where YOLO stands for ‘You only look once’. Its quirky name comes from the algorithm identifying all the objects in an image by looking at it only once.

And that is one of the primary reasons why the algorithm can detect objects faster than the RCNN algorithms.

This project will use the YOLO algorithm to build a custom OCR with Python. The reason behind building a custom OCR model is that YOLO only knows how to identify 80 predefined classes of the COCO dataset.

Thus, this project will guide you through transfer learning to create a YOLO-text-recognition model using the invoices dataset.

As specified already, this custom OCR system will identify the three objects from the invoice images: invoice number, Billing Date, and Total amount and create a bounding box around them once the respective entities have been identified.

OCR using Tesseract

With YOLO, the system will recognize the vital text classes from the invoices but to decode the information in the text; one must utilize Optical Character Recognition (OCR).

Tesseract OCR is a tool that quickly scans text and converts it into digital data. In this project, you will learn how to use Tesseract OCR for creating a custom OCR in Python.

Also Read - Oracle Assignment Help

Step-by-Step Instructions on How to Build Python OCR

Here is a step-by-step guide on building OCR from scratch in Python -

1.    Setting up and Installation to run Yolov4
Downloading AlexeyAB's famous repository, we will adjust the Makefile to enable OPENCV and GPU for darknet and then build darknet.

2.    Downloading pre-trained YOLOv4 weights

YOLOv4 has already been trained on the coco dataset, with 80 classes that it can predict. We will take these pre-trained weights to understand how they result in some test images.

3.    Creating display functions to display the predicted class.

Here, you will learn how to use OpenCV for visualizing object detection results of the YOLO model.

4.    Data    collection    and    Labeling    with    LabelImg

This YOLO OCR project aims to train YOLO to learn three new classes; you will create a new dataset for training and validation. You will create this new dataset with the help of the Labellmg tool that will annotate the image with three classes, and YOLO will then use these annotations during training.

5.    Configuring  Files  for  Training  -

This  step  involves  configuring  custom .cfg, obj.data, obj.names, train.txt and test.txt files.
Configuring all the needed variables based on class in the config file
Creating obj.names and obj.data files

1.    obj.names: Classes to be detected

2.    obj.data

    Configuring train.txt and test.txt

6.    Download pre-trained weights for the convolutional layers

YOLO's object detection model has already been trained on the COCO dataset for 80 different classes. One can download these weights and then fine-tune them accordingly with the help of their custom dataset. The great part about this is the fact that even with fewer data points, by just adding a couple of layers of learning on top of existing ones, the model can learn and adapt to the new classes.

7.    Training Custom Object Detector
8.    Evaluating the model using Mean Average precision
9.    Predict image classes and save the coordinates separately
10.    Detecting text from the predicted class

    Importing pytesseract and setting environment variable (for windows only, for Unix it is already set) for English trained data
    Getting the list of predicted files from the directory
    Using tesseract pre-trained LSTM model to extract the text
    Fine-tuning the LSTM model. (Please note that fine-tuning the model will only be required if the extracted text is inaccurate to that shown in the image)

Top IT Samples

Understanding Critical Thinking and Harvard Referencing Human Centred Systems Design

We Can Help!

Are you confident that you will achieve the grade? Our best Expert will help you improve your grade

Order Now
Chat on WhatsApp
Call Now
Chat on WhatsApp
Call Now

Best Universities In Australia

Best In Countries

Upload your requirements and see your grades improving.

10K+ Satisfied Students. Order Now

Disclaimer: The reference papers given by DigiAssignmentHelp.com serve as model papers for students and are not to be presented as it is. These papers are intended to be used for reference & research purposes only.
Copyright © 2022 DigiAssignmentHelp.com. All rights reserved.

100% Secure Payment