keywords: DeepLearning, Tesseract, Optical Character Recognition, ORC

Tesseract

Improving the quality of the output

Origin:
https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md

How to improve the Chinese recognition rate

Take the Tesseract as an example:

import pytesseract
from PIL import Image, ImageEnhance, ImageFilter

# 指定Tesseract的路径(根据实际情况修改)
# pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

# 加载并预处理图像
image = Image.open('path/to/your/image.jpg')
image = image.convert('L')  # 转换为灰度图
enhancer = ImageEnhance.Contrast(image)
image = enhancer.enhance(2)  # 提高对比度
image = image.filter(ImageFilter.MedianFilter())  # 应用中值滤波去噪
image = image.point(lambda x: 0 if x < 140 else 255)  # 二值化

# 使用Tesseract进行中文文本识别
text = pytesseract.image_to_string(image, lang='chi_sim')  # 使用简体中文数据

# 打印识别结果
print(text)

Origin:
https://blog.csdn.net/r081r096/article/details/136923044

OpenCV

Image Thresholding (Binarization)
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
 
img = cv.imread('sudoku.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
img = cv.medianBlur(img,5)
 
ret,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
th2 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
            cv.THRESH_BINARY,11,2)
th3 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
            cv.THRESH_BINARY,11,2)
 
titles = ['Original Image', 'Global Thresholding (v = 127)',
            'Adaptive Mean Thresholding', 'Adaptive Gaussian Thresholding']
images = [img, th1, th2, th3]
 
for i in range(4):
    plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
    plt.title(titles[i])
    plt.xticks([]),plt.yticks([])
plt.show()

Origin:
https://docs.opencv.org/4.11.0/d7/d4d/tutorial_py_thresholding.html


战国战争史:齐魏马陵之战
马陵之战是中国战争史上设伏歼敌的战例,这次战役中孙膑利用庞涓的弱点,制造假象,诱其就范,使战局始终居于主动地位。
公元前343年,魏国为了补偿在桂陵之战损失,发兵攻打韩国。齐威王待魏韩火拼后以田盼为主将,田婴为副将,孙膑为军师攻打魏国。魏国派太子申来抵挡,在马陵全军覆没,随之田盼用孙膑提出的“减灶”之策诱魏国庞涓中计,追至马陵中伏身亡,齐军乘胜追击,俘太子申,但未全歼魏军。
经此一战魏国元气大伤,失去了霸主地位。马陵之战也成为中国古代战争史上的著名战例。