News in argp module
(did I say I am maintaining it? Well, so I am, since the day before yesterday)
I've fixed a nasty coredump that occurred if the option name of an
OPTION_DOC
was set to NULL
. Besides, updated
argp
will never attempt to translate an empty help string.
And finally, new option OPTION_NO_TRANS
prohibits translating
name
field of an OPTION_DOC
option. The latter
is already used by tar
.
Dictionary structure
I have started to rewrite dictionaries in Ellinika project. The idea is
to use XML
instead of the definition language I have been
using so far. The latter was designed to be simple, short (I hate to
type) and suitable for describing dictionary entries. Its parser
is written in C
and is therefore fairly fast. Presently,
the dictionary
created with it contains 1150 entries, and the process of its creation
has confirmed that the dictionary structure is right and
input language generally suitable for the purpose. However, it has also
exposed some drawbacks of the language.
The principal drawback is that currently a dictionary entry is supposed to contain only one part of speech 1) , i.e. currently assumed entry structure is:
(key part-of-speech articles)However, there exist words that pertain to several parts of speech simultaneously, and that change their meaning according to the part of speech. For example, κρυώνω, when used as a transitive verb, means to refrigerate, whereas being used as an intransitive verb it means to feel cold, to freeze. The number of such words (verbs in particular) in Greek is fairly large.
So, I have decided to redesign the input language, but instead of
simply fixing the already existing language, I've chosen to fully write
the dictionary sources in XML
.
External representation
In the new definition language each entry is represented as follows:
<NODE> <K>string</K>+ [<F>string</F>] <P ID="string"> <M>string</M>+ <A>string</A>* <X>string</X>* <T ID="string" />* <X>string</X>* </P>+ <X>string</X>* </NODE>
(as usual, optional elements are inclosed in brackets, *
means
zero or more occurrences of the element, and +
means one
or more occurrences of the element).
Elements have the following meaning:
NODE
- Start the definition of a dictionary entry
K
- Introduces the dictionary key, i.e. the word of the source language that is explained by this entry. There may be several keys if the notion in question has several sinonyms.
F
- Introduces grammatic forms of the key, whenever these are not formed by standard rules. In future I expect to write proper verb conjugator, then this field will probably mark a reference to or invocation of it.
P
- Part of speech and meanings associated with it (see below). Attribute
ID
introduces the name (usually abbreviated) of the part of speech. M
- Translation of the word (
M
stands for Meaning) A
- Antonym
X
- Cross-reference for this entry. Usually this is a reference to sinonym or some semantically related key.
T
- Topic or group this entry pertains to.
ID
identifies the topic. When many entries pertain to the same topic, their definitions can be enclosed in<T ID="name"> ... </T>
construct.
There are two special forms of this notation. One is useful as a shortcut for those words that have only one part of speech (as I said I hate to type, so I'm trying to spare as much typing as possible):
<NODE> <K>string</K>+ [<F>string</F>] <P>string</P> <M>string</M>* <A>string</A>* <X>string</X>* <T ID="string" />* </NODE>
Another one introduces an entry that is a reference to another entry in the dictionary:
<NODE> <K>string</K>+ [<F>string</F>] <P>string</P> <X>string</X>* <T ID="string" />* </NODE>
This is useful for such pairs as "ο ποταμός" and "το ποτάμι", both meaning
"river" but having different genders. The special form
<X />
means reference to the immediately preceeding node
definition wich has at least one translation (M
element).
Examples
A working example can be found here.
Internal representation
The initial version of the dictionary translator is already available. See its heading comment for the short
description of the internal representation (Scheme
of course).
1) The term part of speech is used here in broader meaning: e.g. for the dictionary purposes transitive verbs and intransitive verbs are regarded as different parts of speech. I'll possibly have to introduce finer granularity here (e.g. part-of-speech/subpart
or something similar), but currently I am not sure it will be worth the effort.
But seek the road which makes death a fulfillment.
Dictionary support is ready
Dictionary of Ellinika site
has been completely rewritten in XML
(see this entry for more info). The project now uses
gamma instead
of guile-sql
+ quile-texinfo
.
html
(sic!) wygląda to tak:
<hr> <center> [ <?guile (let ((env (current-image-neighbors))) (if (car env) (begin (display "<a href=\"") (display (image-url (gallery-id) (car env))) (display "\">") (display "Poprzedne zdjęcie") (display "</a> | "))) (if (cdr env) (begin (display "<a href=\"") (display (image-url (gallery-id) (cdr env))) (display "\">") (display "Następne zdjęcie") (display "</a> | ")))) ?>Już jasne do czego zmierzam, co? No właśnie, chcę zmodyfikowac mod_guile tak, by Apache rozumial
<?guile ... ?>
na równi z
<?php ... ?>
:))