--- Tag: ["Admin", "Computer", "CommandLine", "OCR"] Date: 2021-08-10 DocType: "Product" Hierarchy: "NonRoot" TimeStamp: Product: Type: "IT" Link: "https://github.com/tesseract-ocr/tesseract/" Value: "Free" CollapseMetaTable: true banner: "![[IMG_1971.jpg]]" banner_icon: 🖨 --- Parent:: [[Applications]] ---   ```button name Edit Product parameters type command action MetaEdit: Run MetaEdit id EditMetaData ``` ^button-TesseractMDEdit ```button name Save type command action Save current file id Save ``` ^button-TesseractSave   # Tesseract   ```ad-abstract title: Summary collapse: open Picture reader with OCR technology ```   ```toc style: number ```   --- ### Resource [Link to resource](https://github.com/tesseract-ocr/tesseract)   ---   ### Script   1. Convert PDF to image (FileJuicer) 2. In Terminal, run ```ad-command ~~~ tesseract [input] [output] -l fra ~~~ ``` [Language guide](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages) 3. In Terminal, to run a batch of files ```ad-command ~~~ for I in Part_1_to_p_27-*.jpg; do echo $I; tesseract $I $(basename $i .jpg) -l fra; done ~~~ ``` 4. In Terminal, to move txt files ```ad-command ~~~ mv *.txt /users/mel/documents/lebv.org/website/resource/contents/booklet/text ~~~ ``` 5. In Terminal, to concatenate txt ```ad-command ~~~ cat Part_1_to_p_27-{1..30}.txt >> sorted_combined.txt 2>/dev/null ~~~ ```