Read4me: Using Optical Character Recognition and Machine Learning in Assistive Reading Devices

Purpose

As of 2019, the World Health Organization reports that around 2.2 billion people globally suffer from some form of vision impairment. This large population faces significant challenges in accessing reading materials, as the vast majority of written content is not available in Braille or other accessible formats.

Read4me is a project that was developed as a science fair entry to address the critical issue of accessible reading materials for individuals with visual impairments, dyslexia, or illiteracy. Utilizing Optical Character Recognition (OCR) and Natural Language Processing (NLP), this system composed of a wearable device and an Android application aims to convert printed text to an audio format, thus broadening the scope of educational resources for these groups.

Context and Recognition

The project gained national acclaim by securing first place at the National Science Fair in Tunisia and represented the country at the Intel International Science and Engineering Fair (ISEF) alongside 1,800 other global finalists. It ended up winning a Fourth Grand Award in the Embedded Systems category, making me the first Tunisian to ever receive such an honor at ISEF.

Intel International Science and Engineering Fair 2018 Grand Award Winners - Society for Science

PITTSBURGH, PA – Society for Science & the Public, in partnership with the Intel Foundation, announced Grand Awards of the Intel ISEF 2018. Student winners are ninth through twelfth graders who earned the right to compete at Intel ISEF 2018 by winning a top prize at a local, regional, state, or national science fair.

https://www.societyforscience.org/press-release/intel-international-science-and-engineering-fair-2018-grand-award-winners/

Intel International Science and Engineering Fair 2018 Grand Award Winners - Society for Science

Project Overview

Leveraging a blend of Internet of Things (IoT), Machine Learning, and Cloud Computing technologies, Read4me translates printed text into spoken language, offering a multilingual, user-centric experience.

Alternatives

Some alternatives to Read4me like Microsoft’s "Seeing AI" are geared towards navigation and are not optimized for book reading. MIT's finger-reader, although promising, is still in the development stage and has user accuracy issues.

Materials Utilized

Raspberry Pi 3 Model B

Raspberry Pi Camera V2.1

Android Device

Arduino-Uno

Infra-red Receiver

Infra-red Remote

Software Components:

Firebase Database

Google’s Cloud Vision API

Google’s Translation API

Dialogflow

Python’s Natural Language Toolkit (NLTK)

Project Components

The Read4me system encompasses a Raspberry Pi camera for image capture and Google’s Cloud Vision API for Optical Character Recognition (OCR). Machine Learning algorithms correct and translate the text, which is then relayed to an Android application via a Firebase database for audio output. The system is controlled by the user through voice commands or an infra-red remote.

Prototyping and Development

The Raspberry Pi camera captures an image of the text upon receiving a signal from an infra-red remote control or voice command. OCR processing follows, where Google's Cloud Vision API extracts the text. Subsequently, Machine Learning algorithms correct and translate this text, making it available in real-time on an Android application. This app then employs Text-to-Speech technology to read the text aloud.

Workflow:

Signal Initiation: The Raspberry Pi waits for a user signal, triggered either by pressing a button on the infrared remote control or by a voice command.

Text Capture and OCR: Upon receiving a signal, the Raspberry Pi takes an image and applies OCR to detect text. Initially, Tesseract, an open-source OCR engine, was used for text recognition but was later replaced by Google's Cloud Vision API due to its superior accuracy and use of Convolutional Neural Networks.

Text Correction: Post-OCR, any textual inaccuracies are corrected using a naive approach based on Python’s NLTK. Words that are not proper names and do not exist in the dictionary are replaced.

Translation: The extracted text can be translated to the user's preferred language using Google's Translation API, which is based on Deep Neural Networks and LSTM.

Data Transmission: The Raspberry Pi updates a Firebase real-time database, which the Android app accesses to fetch the processed text.

Text-to-Speech: The Android application utilizes a Text-to-Speech engine to read the text aloud to the user.

User Interactions: The Android application has been further enhanced with “meaning search” and “natural conversation” capabilities through Dialogflow, enabling users to command the app using voice.

Testing and Results

The OCR component initially employed the Tesseract engine but later transitioned to Google’s Cloud Vision API, enhancing the accuracy and reducing latency. Feedback from five visually impaired students at “Al Nour High School” was invaluable for the project's iterative development.

Limitations

Dependency on internet connectivity.

Limited to detecting text, not chapters or author information.

Spell check and voice commands are confined to English.

Significance

While the primary target audience is individuals with visual impairments, the Read4me device also caters to those who are dyslexic or illiterate. Its language translation feature further extends its utility to linguistics learning and cross-cultural appreciation.

Skills Acquired

The development and realization of Read4me allowed me to acquire a multi-faceted set of skills, both technical and soft, that have broad applications in fields such as Software Engineering, Data Science, and Embedded Systems. Below is a breakdown:

Optical Character Recognition (OCR): Gained hands-on experience in utilizing Google’s Cloud Vision API and Tesseract for text extraction. Developed an understanding of the underlying machine learning algorithms, particularly Convolutional Neural Networks.

Natural Language Processing (NLP): Employed Python’s Natural Language Toolkit (NLTK) for text correction. Implemented Dialogflow for enabling voice-based user interactions, learning about the architecture and operation of Deep Neural Networks in NLP.

Machine Translation: Acquired a nuanced understanding of Long Short-Term Memory Networks (LSTMs) while implementing Google's Translation API for multi-language support.

Firebase Real-Time Database: Mastered the implementation of a real-time database, learning crucial aspects of data storage, retrieval, and real-time syncing between hardware and mobile application.

Android Development: Built a fully-functional Android application integrated with the hardware. Familiarized myself with Android's built-in speech recognition algorithms and Text-to-Speech capabilities.

Embedded Systems: Learned to work with Raspberry Pi and Arduino for hardware control, interfacing the Raspberry Pi camera and infrared receiver with the computing unit. Developed competencies in hardware-software integration.

Data Preprocessing: Applied image preprocessing techniques such as adaptive thresholding to improve OCR results, acquiring skills that are particularly relevant for Data Science and Machine Learning applications.

HTTP Protocols: Gained practical experience in implementing HTTP requests for API interactions, a skill essential for backend development and microservices architecture.

Version Control: Adopted Git for source code management, learning best practices in version control that are fundamental in collaborative software development.

Project Management: Utilized agile methodologies for iterative development and testing, learning to manage timelines, set achievable milestones, and adapt to changes in project requirements.

User Testing: Conducted user testing sessions and feedback loops, gaining insights into the human-centered design process and user experience design.

Reflections

The Read4me project transcended its technical objectives to serve as a comprehensive learning experience. It provided me with an invaluable opportunity to travel to the United States for the first time and engage with a community of forward-thinking engineers and researchers. This international exposure allowed me to interact with some of the most intellectually driven young individuals from around the globe.

Coming from Tunisia, where resources in my Science Club were constrained, I initially harbored reservations about competing on a global stage. However, this experience taught me that the path to a successful project is often paved with patience and iterative development. Such an approach not only illuminates areas requiring improvement but also unveils key opportunities that can fundamentally elevate the scope and impact of a project.

This journey significantly altered my perspective on what can be achieved with determination, strategic planning, and a focus on continuous improvement, lessons that I intend to carry forward in my future endeavors.