You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
92 lines
1.3 KiB
92 lines
1.3 KiB
3 years ago
|
---
|
||
|
|
||
|
Tag: ["Admin", "Computer", "CommandLine", "OCR"]
|
||
|
Date: 2021-08-10
|
||
|
DocType: "Product"
|
||
|
Hierarchy: "NonRoot"
|
||
|
TimeStamp:
|
||
|
ProductType: "IT"
|
||
|
SourceLink: "https://github.com/tesseract-ocr/tesseract/"
|
||
|
PriceValue: "Free"
|
||
|
|
||
|
---
|
||
|
|
||
|
Parent:: [[Applications]]
|
||
|
|
||
|
---
|
||
|
|
||
|
 
|
||
|
|
||
|
```button
|
||
|
name Edit Product parameters
|
||
|
type command
|
||
|
action MetaEdit: Run MetaEdit
|
||
|
id EditMetaData
|
||
|
```
|
||
|
^button-TesseractMDEdit
|
||
|
|
||
|
```button
|
||
|
name Save
|
||
|
type command
|
||
|
action Save current file
|
||
|
id Save
|
||
|
```
|
||
|
^button-TesseractSave
|
||
|
|
||
|
 
|
||
|
|
||
|
# Tesseract
|
||
|
|
||
|
 
|
||
|
|
||
|
```ad-abstract
|
||
|
title: Summary
|
||
|
collapse: open
|
||
|
Picture reader with OCR technology
|
||
|
```
|
||
|
|
||
|
 
|
||
|
|
||
|
```toc
|
||
|
style: number
|
||
|
```
|
||
|
|
||
|
 
|
||
|
|
||
|
---
|
||
|
|
||
|
### Resource
|
||
|
|
||
|
[Link to resource](https://github.com/tesseract-ocr/tesseract)
|
||
|
|
||
|
 
|
||
|
|
||
|
---
|
||
|
|
||
|
 
|
||
|
|
||
|
### Script
|
||
|
|
||
|
 
|
||
|
|
||
|
1. Convert PDF to image (FileJuicer)
|
||
|
|
||
|
|
||
|
2. In Terminal, run
|
||
|
|
||
|
`tesseract [input] [output] -l fra`
|
||
|
[Language guide](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages)
|
||
|
|
||
|
3. In Terminal, to run a batch of files
|
||
|
|
||
|
`for I in Part_1_to_p_27-*.jpg; do echo $I; tesseract $I $(basename $i .jpg) -l fra; done`
|
||
|
|
||
|
4. In Terminal, to move txt files
|
||
|
|
||
|
`mv *.txt /users/mel/documents/lebv.org/website/resource/contents/booklet/text`
|
||
|
|
||
|
5. In Terminal, to concatenate txt
|
||
|
|
||
|
`cat Part_1_to_p_27-{1..30}.txt >> sorted_combined.txt 2>/dev/null`
|
||
|
|