Difference between revisions of "Dictionary"
From CCIL
(→Setup) |
(→Parsing the text) |
||
Line 22: | Line 22: | ||
</pre> | </pre> | ||
You can use any file in the place of ''source.pdf''. It is just an ordinary text downloaded from the Internet. Of course, more words it contains - the better. | You can use any file in the place of ''source.pdf''. It is just an ordinary text downloaded from the Internet. Of course, more words it contains - the better. | ||
+ | |||
+ | === Project === | ||
+ | TBA | ||
+ | |||
+ | === Startup script === | ||
+ | |||
+ | tutorials-dictionary-app.sh | ||
+ | <pre> | ||
+ | #!/bin/bash | ||
+ | CCIL_HOME=`dirname $PWD` | ||
+ | CCIL_CONTEXT=$CCIL_HOME/context | ||
+ | |||
+ | echo ------------------- | ||
+ | echo CCIL_HOME = $CCIL_HOME | ||
+ | echo ------------------- | ||
+ | |||
+ | java -cp "$CCIL_HOME/lib/*:$CCIL_HOME/config:$CCIL_HOME/launcher/*" -Dserver.config.file=tutorials-dictionary-app.ttl -Dserver.home.dir=$CCIL_HOME -Xmx1024M -Dserver.context.dir=$CCIL_CONTEXT -Dserver.jmx.enabled=false net.ccil.execution.CcilConsoleApp -execute -root $CCIL_HOME/context/apps "$@" | ||
+ | </pre> | ||
== Parsing the text == | == Parsing the text == |
Revision as of 04:10, 17 May 2017
Contents
Goal
The goal of this tutorial is to create a simple dictionary - a database with words from a specific language(s). It will be created in a very simple manner - we supply some text to the pipeline (in PDF, TXT or any other popular format), which parses it and insert the words in a unique manner to a database.
What we have to do?
- Parse text which comes in an arbitrary format
- Insert all tokens words from it, which satisfy the 'word' criteria in a database with no duplications
Setup
Obviously, we will need to setup a context. It has a very simple structure, for the purpose of the tutorial we will name it "dictionary" :
context \- apps \- dictionary |- languages | \- en | \- source.pdf \- context.properties
You can use any file in the place of source.pdf. It is just an ordinary text downloaded from the Internet. Of course, more words it contains - the better.
Project
TBA
Startup script
tutorials-dictionary-app.sh
#!/bin/bash CCIL_HOME=`dirname $PWD` CCIL_CONTEXT=$CCIL_HOME/context echo ------------------- echo CCIL_HOME = $CCIL_HOME echo ------------------- java -cp "$CCIL_HOME/lib/*:$CCIL_HOME/config:$CCIL_HOME/launcher/*" -Dserver.config.file=tutorials-dictionary-app.ttl -Dserver.home.dir=$CCIL_HOME -Xmx1024M -Dserver.context.dir=$CCIL_CONTEXT -Dserver.jmx.enabled=false net.ccil.execution.CcilConsoleApp -execute -root $CCIL_HOME/context/apps "$@"
Parsing the text
TBA
Insert into database
TBA
Further steps
TBA