Difference between revisions of "Dictionary"
From CCIL
(→Setup) |
|||
Line 11: | Line 11: | ||
== Setup == | == Setup == | ||
− | + | Obviously, we will need to setup a context. It has a very simple structure, for the purpose of the tutorial we will name it "dictionary" : | |
+ | <pre> | ||
+ | context/apps | ||
+ | \- distionary | ||
+ | |- languages | ||
+ | | \- en | ||
+ | | \- source.pdf | ||
+ | \- context.properties | ||
+ | </pre> | ||
== Parsing the text == | == Parsing the text == |
Revision as of 03:57, 17 May 2017
Goal
The goal of this tutorial is to create a simple dictionary - a database with words from a specific language(s). It will be created in a very simple manner - we supply some text to the pipeline (in PDF, TXT or any other popular format), which parses it and insert the words in a unique manner to a database.
What we have to do?
- Parse text which comes in an arbitrary format
- Insert all tokens words from it, which satisfy the 'word' criteria in a database with no duplications
Setup
Obviously, we will need to setup a context. It has a very simple structure, for the purpose of the tutorial we will name it "dictionary" :
context/apps \- distionary |- languages | \- en | \- source.pdf \- context.properties
Parsing the text
TBA
Insert into database
TBA
Further steps
TBA