You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

113 lines
1.4 KiB

3 years ago
---
2 years ago
Tag: ["🤖", "💻", "CommandLine", "OCR"]
3 years ago
Date: 2021-08-10
DocType: "Product"
Hierarchy: "NonRoot"
TimeStamp:
Product:
Type: "IT"
Link: "https://github.com/tesseract-ocr/tesseract/"
Value: "Free"
CollapseMetaTable: true
2 years ago
banner: "![[IMG_1971.jpg]]"
banner_icon: 🖨
3 years ago
---
Parent:: [[Applications]]
---
 
```button
name Edit Product parameters
type command
action MetaEdit: Run MetaEdit
id EditMetaData
```
^button-TesseractMDEdit
```button
name Save
type command
action Save current file
id Save
```
^button-TesseractSave
 
# Tesseract
 
```ad-abstract
title: Summary
collapse: open
Picture reader with OCR technology
```
 
```toc
style: number
```
 
---
### Resource
[Link to resource](https://github.com/tesseract-ocr/tesseract)
 
---
 
### Script
 
1. Convert PDF to image (FileJuicer)
2. In Terminal, run
```ad-command
~~~
tesseract [input] [output] -l fra
~~~
```
[Language guide](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages)
3. In Terminal, to run a batch of files
```ad-command
~~~
for I in Part_1_to_p_27-*.jpg; do echo $I; tesseract $I $(basename $i .jpg) -l fra; done
~~~
```
4. In Terminal, to move txt files
```ad-command
~~~
mv *.txt /users/mel/documents/lebv.org/website/resource/contents/booklet/text
~~~
```
5. In Terminal, to concatenate txt
```ad-command
~~~
cat Part_1_to_p_27-{1..30}.txt >> sorted_combined.txt 2>/dev/null
~~~
```