Automated language detection

A book through a looking glass
The KDE spell checking library.

A long time ago (2006) a cool guy named Jacob Rideout started work on automated language detection in the KDE spell checking framework Sonnet. Unfortunately he never finished it, and while Jakub Stachowski gave it another shot in 2009, it never got merged into a released version of Sonnet.

But last early year, I got tired of Quassel giving me constant red underlining in all my Norwegian IRC channels, and decided to finish up the code and clean it up, and get it merged. And in the process I seem to have turned into the maintainer of Sonnet.

At a very high level, the language detection scheme currently works as this:

First, it looks at all the characters in the sentence it wants to guess the language for. Thanks to QChar, we can easily find which writing system/script the characters belong to, and that allows us to filter out a bunch of languages. The list of possible languages from this is sorted by longest substring; this means that if you write a sentence in one script (for example latin), and then have a single word in for example cyrillic, it will consider latin languages first.

Unfortunately, for example latin script (like I use here in this blog post) is used by a bunch of languages, which means we need a way to efficiently guess what language a string is. The idea that Mr. Rideout originally borrowed from a perl script named “languid”, is to generate a list of the most common triple-letter strings (trigrams) in all languages, and then use that to guess which language the string most likely is.

Finally, if we haven’t been able to narrow it down to a good guess so far, we go bruteforce and just test with all available dictionaries. We simply check all words with all dictionaries available, and the dictionary that recognizes the most words is used.

This is only available in the Frameworks (5) version of Sonnet, however, so if you want this, port your applications to Qt5 and Frameworks. :-D

Other cool stuff I plan on implementing is grammar checking, readability scoring and text completion. For grammar checking I plan on using linkgrammar, for the languages it supports (unfortunately not many, but it isn’t that hard to create support for new languages), and re-using the datafiles from the LanguageTool for OpenOffice; re-use the XML files as is as much as possible, and rewrite the Java snippets into JavaScript. I also want to use this language detection in Baloo, so that it can automatically tag the language of the files it indexes (it was originally integrated into Strigi as well).

Mangonel 1.1, and more

Mangonel
Mangonel

In KDE3 times there was an awesome, simplistic launcher named Katapult. In KDE4 we have Mangonel.

I have just released version 1.1 of it, mostly just a bug fix, tweaking the prioritization. But it also has a couple of new features (configurable launch notification, config ui for autostarting).

Also, I just pushed a nice new feature for KF5, namely retrying in KIO. So if copying for some reason fails, it now asks you if you want to retry, in addition to the quasi-useless (IMHO) skip/autoskip options. But that’s coming in KF5, so not something that’s user visible in the immediate future.

But for the next release of Konsole I also recently pushed a couple of features, namely support for CTRL-clicking on URLs to open them, and also made it tell you what applications are actually running if you try to close it:

new konsole feature
new konsole feature

Hacking on Frameworks in Arch Linux

Everyone loves Frameworks.
The Frameworks crowd goes wild. Taken by myself.

Just a short note in case anyone missed this little piece of awesomeness: Andrea Scarpino, who is the Arch Linux developer responsible for packaging KDE and Qt related stuff in Arch Linux, has a nice little repository containing packages of the stuff needed for hacking on KF5: https://bbs.archlinux.org/viewtopic.php?id=164647

How he finds time for doing this between packaging normal KDE releases, hacking on frameworks, and having a normal life I don’t know, but I’m very thankful as it saves me a lot of time.

Filelight

filelight
Filelight’s new progress animation.

So, I finally sat down and hacked a bit on Filelight. As well as fixing whatever I could find of bugs that noone reported (like crashing very often if you zoomed because I forgot to reset the focused element, or that it wouldn’t update properly when you deleted something), I also applied a patch from Dave Vasilevsky that had been lingering in the bugzilla for way too long that fixes the caching (now it won’t rescan if you have already scanned something, and won’t grow the cache infinitely large).

I also implemented a fancy progress widget (pulsating, fading colors and rotating), which the screenshot doesn’t really do justice.

I also fixed other minor and major (like ignoring the ignore list) bugs reported to bugzilla.

Lastly, I replaced the old and ugly icon with a new and snazzy one made by Sune Vuorela (which is based on the harddrive icon made by Pinheiro):

hi64-app-filelight

 

So look forward to the next release of Filelight in KDE 4.11!

IRC commit notifications

CIA.vc is dead, long live Pursuivant!

When CIA.vc died KDE lost its fancy IRC notifications for commits, which kind of sucked. I initially hacked up a quick and ugly solution using rbot, exim and a really quick and dirty python script to replace it, that just posted all commits to the #kde-commits IRC channel. But now, thanks to the magnificent fellow Ben Cooksley of the KDE sysadmin team, we now have a proper solution, based on irker and some custom magic whipped up by Ben! Together with Ohloh, I believe we now have what we lost with CIA.vc, and more!

So now you will hopefully see a IRC bot named pursuivant-### (where ### is a semi-random string of numbers) pop up in a KDE IRC channel near you, spewing commit messages like there was no tomorrow. To get the bot to join your IRC channel, or to get it to notify about a particular repository, either ask the sysadmin team (for example in the #kde-sysadmin channel) or edit the rules file in git yourself. The entire solution is available in  kde:sysadmin/irc-notifications, the rules live in gateway/notifications.cfg. It supports full PCRE regular expressions.

The cancer that is killing free software

Herp derp.

From this: http://debarshiray.wordpress.com/2012/10/06/goa-why-it-is-the-way-it-is/

“However, it is a bit more complicated than that. MeeGo (and now Ubuntu’s) SSO has a DBus daemon written using Qt. We have been down this road before, most recently with Maliit. The daemon would have had to be rewritten without Qt using GObject for it to be a part of GNOME.”

Honestly, do I need to write anything? No reasoning or explanations, just knee-jerk “it’s written in Qt, we can’t touch this”. When did this turn into a political battle? We shouldn’t working against eachother here, if someone has great technology we can use (as long as it is freely available and licensed properly, works well, etc.).

I think this is one of the reasons KDE has thrived as well as it has, even without the commercial backing that Gnome has enjoyed; less politics and more fun. I don’t think I’ve ever seen a piece of software getting denied inclusion in KDE because it had non-kosher dependencies.

The comments also have some gold, though, like this:

“But whether that means GNOME should accept Qt is not a decision that is for me or a single individual to make.”

So Gnome is now a corporate, hierarchical software organization, where individual developers have no say? What happened to meritocracies? And where the flying fuck comes thoughts like the need to “accept Qt” from?

Honestly, I like Gnome (though I haven’t used it as my main desktop in some years), I like the Gnome developers, and I think they make a lot of great technology which helps free software in general (and KDE in particular). That’s why it boggles my mind that they sometimes come with these outbursts of irrationality.

/rant

Why am I a freetard?

picture of two freetard trollcats
Freetard trollcats.

After having to fix the newly installed Ubuntu installation on a laptop of a friend today (she was missing drivers for her integrated Intel graphics chip, no idea how Ubuntu managed to screw that up), I thought silently to myself: “why the fuck?”. Why the fuck do I continue to use and support this broken crap? But then I ignored those thoughts while I was trying to figure out why KLauncher was hanging, Apache was whining about NameVirtualHost and VirtualHost mismatches, and other fun stuff.

Then later, while I was waiting for my dinner to finish, I was browsing reddit.com, and saw this post near the top of the front page. In case you’re not in the mood to read some random rant, it’s basically a list of pretty serious bugs and annoyances in the software that the author uses every day. The TL;DR from it is probably this quote: “I work for Microsoft, have my personal life in Google, use Apple devices to access it and it all sucks.”

And then a thought crystallized for me. I have never been able to exactly pin-point why I keep coming back to the cozy mess that is the free software ecosystem. What happened is that I caught myself thinking “why don’t he just sit down and fix it?”. I’m so used to just launching gdb when something hangs, peaking through the source code when I come across unexpected behaviour, or crawl through a stack trace when something crashes. For all its annoying quirks and bugs, when I have the source code I can fix everything from minor annoyances like highlights not disappearing, to crashes when I log in. And not only can I look into how it works, if I can’t figure out the problem myself I can jump on IRC with or fire off a mail to whomever is responsible for the application in question.

That thought, “I can always fix it”, converged with what he wrote; “it all sucks”. The grand point is that while no software is perfect, there are bugs and problems in everything, in every layer of the stack, with free software I am empowered and encouraged to improve it myself. Either directly or simply by just supporting whomever fixes it for me.

Another, separate, issue from that blog post comes from this quote: “Alone or in a crowd, no one cares.” With free software you (almost) always have a community around the software in question. Even if the original author doesn’t respond it’s usually pretty easy to find someone else who cares about the software.

I’ll just round this off with a quote from a friend and fellow KDE hacker, Eike Hein, which I think captures the very essence of this blog post: “I like using open source because I like having the assurance that I can always fix it if I just learn enough, and that there’s nothing blocking my path toward learning it other than time constraints”.

Soundtrack: Miss Murder’s Personal Jesus