You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1.4 KiB

Tag Date DocType Hierarchy TimeStamp Product CollapseMetaTable banner banner_icon
🤖
💻
CommandLine
OCR
2021-08-10 Product NonRoot
Type Link Value
IT https://github.com/tesseract-ocr/tesseract/ Free
true !IMG_1971.jpg 🖨

Parent:: Applications


name Edit Product parameters
type command
action MetaEdit: Run MetaEdit
id EditMetaData

^button-TesseractMDEdit

name Save
type command
action Save current file
id Save

^button-TesseractSave

Tesseract

title: Summary
collapse: open
Picture reader with OCR technology

style: number


Resource

Link to resource


Script

  1. Convert PDF to image (FileJuicer)

  2. In Terminal, run

~~~
tesseract [input] [output] -l fra
~~~

Language guide

  1. In Terminal, to run a batch of files
~~~
for I in Part_1_to_p_27-*.jpg; do echo $I; tesseract $I $(basename $i .jpg) -l fra; done
~~~
  1. In Terminal, to move txt files
~~~
mv *.txt /users/mel/documents/lebv.org/website/resource/contents/booklet/text
~~~
  1. In Terminal, to concatenate txt
~~~
cat Part_1_to_p_27-{1..30}.txt >> sorted_combined.txt 2>/dev/null
~~~