--- Tag: ["Admin", "Computer", "CommandLine", "OCR"] Date: 2021-08-10 DocType: "Product" Hierarchy: "NonRoot" TimeStamp: ProductType: "IT" SourceLink: "https://github.com/tesseract-ocr/tesseract/" PriceValue: "Free" --- Parent:: [[Applications]] ---   ```button name Edit Product parameters type command action MetaEdit: Run MetaEdit id EditMetaData ``` ^button-TesseractMDEdit ```button name Save type command action Save current file id Save ``` ^button-TesseractSave   # Tesseract   ```ad-abstract title: Summary collapse: open Picture reader with OCR technology ```   ```toc style: number ```   --- ### Resource [Link to resource](https://github.com/tesseract-ocr/tesseract)   ---   ### Script   1. Convert PDF to image (FileJuicer) 2. In Terminal, run `tesseract [input] [output] -l fra` [Language guide](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages) 3. In Terminal, to run a batch of files `for I in Part_1_to_p_27-*.jpg; do echo $I; tesseract $I $(basename $i .jpg) -l fra; done` 4. In Terminal, to move txt files `mv *.txt /users/mel/documents/lebv.org/website/resource/contents/booklet/text` 5. In Terminal, to concatenate txt `cat Part_1_to_p_27-{1..30}.txt >> sorted_combined.txt 2>/dev/null`