During last few years, I spent a lot of time learning foreign languages like Esperanto, Spanish and German.
After a while, I came up with an idea that I can apply this knowledge in computer science.
When I decided this I was completely new to Computational Linguistics(CL) and Natural Language Processing(NLP).
However after reading a number of articles I got some basic ideas.
What I am gonna do
To dive into CL/NLP I’ve decided implement Toki Pona -> English translator from scratch.
It’s interesting to see which issues I will face and how I will solve them.
It will make me go through number of stages of language processing:
Language detection (I want to distinguish Toki Pona from other languages)
Morphological analysis (actually will be skipped because of simplicity of Toki Pona)
Syntax tree conversion
Generation of final translation with respect to English grammar.
Anyway, this list is not strict, and probably it will be modified in the future.
What I am not gonna do
There are many tools and libraries that already exist in Ruby for NLP.
I am not gonna use any of them here neither cover them in the articles.
If you need something like that, please take a look at ruby-nlp.
It’s a document that gathers a variety of NLP tools implemented in ruby.
I few days ago my colleague Arthur Shagall reviewing my
code suggested me to use Lazy Object pattern to postpone some calculations during the load time.
I hadn’t heard about the pattern before and even googling it didn’t give my much information.
So I have decided to write this article to cover the topic.
Lazy Object allows you to postpone some calculation until the moment when the actual
result of the calculation is used. That may help you to speed up booting of the application.
It is pretty simple. We create a proxy object that takes a calculation
block as its property and execute it on first method call.
The problem I’m trying to solve in this article is comparison of two
audio files. We’ll figure out how to verify that they sound similar.
I was developing an application that has a deal with audio processing and
I had to write a test to verify outcome audio file matches a one
from fixtures. Well, I’ve decided to compare audio binaries like these:
But soon my colleagues let me know I had broken the build. It turned out
that outcome.mp3generated on their Mac books didn’t match fixture.mp3
generated on my linux laptop, despite the fact that both sounded
absolutely the same. Probably we had different codecs.
So I had to come up with a better idea.
apt-cache search ^fonts- - find all packages which starts with fonts-;
sed 's/^\(fonts-[^ ]*\).*$/\1/' - filter output to get only package names;
xargs apt-get install - pass package names to apt-get install to install them.
Now you have more than 1500 fonts, but it’s hard to pick one that you need, because
it’s hard to look through all of them. For our luck there exist specials to preview
fonts, and one of is called fontmatrix. Lets install it:
Few days ago I was writing a ruby wrapper for SoX
command line tool. To reduce disk IO I wanted to use process substitution.
It’s a cool shell feature which allows to use command output as an input file for another command.
It’s pretty useful if the second command doesn’t work with standard input or you need
to pass more than 1 input.
Let me show the classic example(works in bash and zsh):
cat <(echo'Saluton!') <(echo'Kiel vi fartas?')# => Saluton! Kiel vi fartas?
So statement <(echo 'Saluton!') is treated like a file which contains line Saluton!.
Underhood bash(zsh) creates a named pipeline where output of echo 'Saluton!' is written.
Then the named pipeline is passed to cat command.
You can see it:
echo <(echo'Saluton!')# => /dev/fd/63
So I wanted to use it in ruby:
cmd="cat <(echo 'Saluton!') <(echo 'Kiel vi fartas?')"system(cmd)
But unfortunately it doesn’t work:
The problem is that ruby’s system method and back quotes usesh
not your current shell (which in my case is bash).
system"echo $0"# => sh
In shells $0points to the current script or to interpreter if you’re running it interactively.
Fortunately there is a way to create a workaround to run bash:
Sometimes ActiveRecord is not enough to meet complicated validation needs.
At TMXCredit we’ve created Themis -
ActiveRecord extension which helps to organize validations in a better way and adds
some flexibility. Here I’m gonna describe some problems which Themis solves after that
I’ll take a brief look at possible alternative solutions.
Themis allows you to extract duplicated validations into module for reuse.
Usually rails applications are small enough so you don’t need it. But sometimes
The next example is pretty flat(in real life you probably would use STI or composition to
represent Doctor and Patient models) but it illustrates where Themis could be useful.
You see that both models have the same validation for first_name, last_name and email.
Themis allows you to fix the duplication problem by extracting common validations into
# Module with common validations.modulePersonValidationextendThemis::Validationvalidates:first_name,:last_name,:email,:presence=>trueendclassDoctor<ActiveRecord::Base# import validation of first_name, last_name, emailincludePersonValidationvalidates:diploma,:presence=>trueendclassPatient<ActiveRecord::BaseincludePersonValidationvalidates:age,:presence=>trueend
Как вы знаете, я начал изучать Эсперанто, некоторым даже успел “съесть” мозг, но всё же большинство по-прежнему задают вопросы почему и для чего я это делаю. Из этого можно прийти к выводу, что либо я безумен, либо я понимаю что-то, что другие не могут понять.
Поэтому я решил написать небольшую статью и постараться объясниться.