---

Tag: ["Admin", "Computer", "CommandLine", "OCR"]
Date: 2021-08-10
DocType: "Product"
Hierarchy: "NonRoot"
TimeStamp:
Product:
 Type: "IT"
 Link: "https://github.com/tesseract-ocr/tesseract/"
 Value: "Free"

---

Parent:: [[Applications]]

---

&emsp;

```button
name Edit Product parameters
type command
action MetaEdit: Run MetaEdit
id EditMetaData
```
^button-TesseractMDEdit

```button
name Save
type command
action Save current file
id Save
```
^button-TesseractSave

&emsp;

# Tesseract

&emsp;

```ad-abstract
title: Summary
collapse: open
Picture reader with OCR technology
```

&emsp;

```toc
style: number
```

&emsp;

---

### Resource

[Link to resource](https://github.com/tesseract-ocr/tesseract)

&emsp;

---

&emsp;

### Script

&emsp;

1. Convert PDF to image (FileJuicer)


2. In Terminal, run

`tesseract [input] [output] -l fra`
[Language guide](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages)

3. In Terminal, to run a batch of files

`for I in Part_1_to_p_27-*.jpg; do echo $I; tesseract $I $(basename $i .jpg) -l fra; done`

4. In Terminal, to move txt files

`mv *.txt /users/mel/documents/lebv.org/website/resource/contents/booklet/text`

5. In Terminal, to concatenate txt

`cat Part_1_to_p_27-{1..30}.txt >> sorted_combined.txt 2>/dev/null`