Difference between revisions of "Dictionary"

From CCIL
Jump to: navigation, search
(Setup)
(Parsing the text)
Line 22: Line 22:
 
</pre>
 
</pre>
 
You can use any file in the place of ''source.pdf''. It is just an ordinary text downloaded from the Internet. Of course, more words it contains - the better.
 
You can use any file in the place of ''source.pdf''. It is just an ordinary text downloaded from the Internet. Of course, more words it contains - the better.
 +
 +
=== Project ===
 +
TBA
 +
 +
=== Startup script ===
 +
 +
tutorials-dictionary-app.sh
 +
<pre>
 +
#!/bin/bash
 +
CCIL_HOME=`dirname $PWD`
 +
CCIL_CONTEXT=$CCIL_HOME/context
 +
 +
echo -------------------
 +
echo CCIL_HOME = $CCIL_HOME
 +
echo -------------------
 +
 +
java -cp "$CCIL_HOME/lib/*:$CCIL_HOME/config:$CCIL_HOME/launcher/*" -Dserver.config.file=tutorials-dictionary-app.ttl -Dserver.home.dir=$CCIL_HOME -Xmx1024M -Dserver.context.dir=$CCIL_CONTEXT -Dserver.jmx.enabled=false net.ccil.execution.CcilConsoleApp -execute -root $CCIL_HOME/context/apps "$@"
 +
</pre>
  
 
== Parsing the text ==
 
== Parsing the text ==

Revision as of 04:10, 17 May 2017

Goal

The goal of this tutorial is to create a simple dictionary - a database with words from a specific language(s). It will be created in a very simple manner - we supply some text to the pipeline (in PDF, TXT or any other popular format), which parses it and insert the words in a unique manner to a database.


What we have to do?

  1. Parse text which comes in an arbitrary format
  2. Insert all tokens words from it, which satisfy the 'word' criteria in a database with no duplications

Setup

Obviously, we will need to setup a context. It has a very simple structure, for the purpose of the tutorial we will name it "dictionary" :

context
\- apps
   \- dictionary
      |- languages
      |  \- en
      |     \- source.pdf
      \- context.properties

You can use any file in the place of source.pdf. It is just an ordinary text downloaded from the Internet. Of course, more words it contains - the better.

Project

TBA

Startup script

tutorials-dictionary-app.sh

#!/bin/bash
CCIL_HOME=`dirname $PWD`
CCIL_CONTEXT=$CCIL_HOME/context

echo -------------------
echo CCIL_HOME = $CCIL_HOME
echo -------------------

java -cp "$CCIL_HOME/lib/*:$CCIL_HOME/config:$CCIL_HOME/launcher/*" -Dserver.config.file=tutorials-dictionary-app.ttl -Dserver.home.dir=$CCIL_HOME -Xmx1024M -Dserver.context.dir=$CCIL_CONTEXT -Dserver.jmx.enabled=false net.ccil.execution.CcilConsoleApp -execute -root $CCIL_HOME/context/apps "$@"

Parsing the text

TBA

Insert into database

TBA

Further steps

TBA