Difference between revisions of "Dictionary"

From CCIL
Jump to: navigation, search
(Setup)
Line 11: Line 11:
  
 
== Setup ==
 
== Setup ==
TBA
+
Obviously, we will need to setup a context. It has a very simple structure, for the purpose of the tutorial we will name it "dictionary" :
 +
<pre>
 +
context/apps
 +
  \- distionary
 +
    |- languages
 +
    |    \- en
 +
    |        \- source.pdf
 +
    \- context.properties
 +
</pre>
  
 
== Parsing the text ==
 
== Parsing the text ==

Revision as of 03:57, 17 May 2017

Goal

The goal of this tutorial is to create a simple dictionary - a database with words from a specific language(s). It will be created in a very simple manner - we supply some text to the pipeline (in PDF, TXT or any other popular format), which parses it and insert the words in a unique manner to a database.


What we have to do?

  1. Parse text which comes in an arbitrary format
  2. Insert all tokens words from it, which satisfy the 'word' criteria in a database with no duplications

Setup

Obviously, we will need to setup a context. It has a very simple structure, for the purpose of the tutorial we will name it "dictionary" :

context/apps
  \- distionary
     |- languages
     |    \- en
     |        \- source.pdf
     \- context.properties

Parsing the text

TBA

Insert into database

TBA

Further steps

TBA