The system was tested according to a small sampling of 50 questions in the given subject domain. The answer was accepted, if at least one of the options provided by the system could be found in the source encyclopedias, with an accuracy of 100%, the most accurate answer was at the top of the list, and the less accurate answers were below (for example, the list of 25 answers, the 1st place – 100%, the 2nd – 96%, the 3rd – 92%, etc. up to the 25th place – 4%).
The table below presents several examples of the system operation based on the etiquette encyclopedias (compared with standard full text search in the same sources). A figure in brackets is the system mark.
Question | Answer | Other answers from full text search |
Which gifts can you give to your friends or acquaintances? | You shouldn’t give expensive gifts to your friends or acquaintances. [0.7] |
Objects associated with illnesses, pharmaceutical goods, drugs, and hygiene products are not suitable as gifts. [0.6]
Normally, a bouquet of flowers, a bottle of wine, or a box of chocolates are presented as gifts. [0.05]
|
How is it better to address people? |
You should address a shop assistant in a polite manner, whether you like the local service or not. [0.7]
When there are several people in the room, you should greet the hostess first, then other women, and after that – the host and other men. [0.6]
|
If someone’s behavior seems suspicious to them, they will approach this person and ask him to show his pockets. [0.1]
If the hostess doesn’t say any right words and you know that she has been distracted by something, or she has just forgotten about it, then, after 6-7 people are served, you can start eating. [0.1]
|
What shouldn`t be specified on business cards?
|
Business cards written in any other language than East Slavic languages, shouldn’t indicate a middle name of a holder as most countries don`t have such a notion. [0.7] | Whatever letter it is, business or friendly, you should specify the address and date. [0.3] |
The average percent of the received answers reached 60%, and the relative average accuracy made up 80% (a mean value based on the received answers). The main problem was the answer speed since search in the DBpedia datasets was carried out using the loaded DBpedia Spark public service. It was possible to increase the speed having deployed the DBpedia local mirror, but it would involve essential hardware costs.
Unfortunately, when comparing the NLP tools for English and East Slavic languages, a conclusion is not in the favour of the latter; in particular, parsers, especially MaltParser, are extremely resource-intensive, they demonstrate poor multithread performance and have certain difficulties in clustering. Knowledge bases, for example WordNet.ru, have obvious bugs. Although, many problems have been fixed, considerable work is still required.
The list of the used materials and tools is given below: