Difference between revisions of "Dictionary"
From CCIL
Line 9: | Line 9: | ||
# Parse text which comes in an arbitrary format | # Parse text which comes in an arbitrary format | ||
# Insert all tokens words from it, which satisfy the 'word' criteria in a database with no duplications | # Insert all tokens words from it, which satisfy the 'word' criteria in a database with no duplications | ||
+ | |||
+ | == Setup == | ||
+ | TBA | ||
+ | |||
+ | == Parsing the text == | ||
+ | TBA | ||
+ | |||
+ | == Insert into database == | ||
+ | TBA | ||
+ | |||
+ | == Further steps == | ||
+ | TBA |
Revision as of 03:51, 17 May 2017
Goal
The goal of this tutorial is to create a simple dictionary - a database with words from a specific language(s). It will be created in a very simple manner - we supply some text to the pipeline (in PDF, TXT or any other popular format), which parses it and insert the words in a unique manner to a database.
What we have to do?
- Parse text which comes in an arbitrary format
- Insert all tokens words from it, which satisfy the 'word' criteria in a database with no duplications
Setup
TBA
Parsing the text
TBA
Insert into database
TBA
Further steps
TBA