Automated language detection

A book through a looking glass — The KDE spell checking library.

A long time ago (2006) a cool guy named Jacob Rideout started work on automated language detection in the KDE spell checking framework Sonnet. Unfortunately he never finished it, and while Jakub Stachowski gave it another shot in 2009, it never got merged into a released version of Sonnet.

But last early year, I got tired of Quassel giving me constant red underlining in all my Norwegian IRC channels, and decided to finish up the code and clean it up, and get it merged. And in the process I seem to have turned into the maintainer of Sonnet.

At a very high level, the language detection scheme currently works as this:

First, it looks at all the characters in the sentence it wants to guess the language for. Thanks to QChar, we can easily find which writing system/script the characters belong to, and that allows us to filter out a bunch of languages. The list of possible languages from this is sorted by longest substring; this means that if you write a sentence in one script (for example latin), and then have a single word in for example cyrillic, it will consider latin languages first.

Unfortunately, for example latin script (like I use here in this blog post) is used by a bunch of languages, which means we need a way to efficiently guess what language a string is. The idea that Mr. Rideout originally borrowed from a perl script named “languid”, is to generate a list of the most common triple-letter strings (trigrams) in all languages, and then use that to guess which language the string most likely is.

Finally, if we haven’t been able to narrow it down to a good guess so far, we go bruteforce and just test with all available dictionaries. We simply check all words with all dictionaries available, and the dictionary that recognizes the most words is used.

This is only available in the Frameworks (5) version of Sonnet, however, so if you want this, port your applications to Qt5 and Frameworks. :-D

Other cool stuff I plan on implementing is grammar checking, readability scoring and text completion. For grammar checking I plan on using linkgrammar, for the languages it supports (unfortunately not many, but it isn’t that hard to create support for new languages), and re-using the datafiles from the LanguageTool for OpenOffice; re-use the XML files as is as much as possible, and rewrite the Java snippets into JavaScript. I also want to use this language detection in Baloo, so that it can automatically tag the language of the files it indexes (it was originally integrated into Strigi as well).

Mangonel 1.1, and more

In KDE3 times there was an awesome, simplistic launcher named Katapult. In KDE4 we have Mangonel.

I have just released version 1.1 of it, mostly just a bug fix, tweaking the prioritization. But it also has a couple of new features (configurable launch notification, config ui for autostarting).

Also, I just pushed a nice new feature for KF5, namely retrying in KIO. So if copying for some reason fails, it now asks you if you want to retry, in addition to the quasi-useless (IMHO) skip/autoskip options. But that’s coming in KF5, so not something that’s user visible in the immediate future.

But for the next release of Konsole I also recently pushed a couple of features, namely support for CTRL-clicking on URLs to open them, and also made it tell you what applications are actually running if you try to close it:

Hacking on Frameworks in Arch Linux

Everyone loves Frameworks. — The Frameworks crowd goes wild. Taken by myself.

Just a short note in case anyone missed this little piece of awesomeness: Andrea Scarpino, who is the Arch Linux developer responsible for packaging KDE and Qt related stuff in Arch Linux, has a nice little repository containing packages of the stuff needed for hacking on KF5: https://bbs.archlinux.org/viewtopic.php?id=164647

How he finds time for doing this between packaging normal KDE releases, hacking on frameworks, and having a normal life I don’t know, but I’m very thankful as it saves me a lot of time.

Filelight

So, I finally sat down and hacked a bit on Filelight. As well as fixing whatever I could find of bugs that noone reported (like crashing very often if you zoomed because I forgot to reset the focused element, or that it wouldn’t update properly when you deleted something), I also applied a patch from Dave Vasilevsky that had been lingering in the bugzilla for way too long that fixes the caching (now it won’t rescan if you have already scanned something, and won’t grow the cache infinitely large).

I also implemented a fancy progress widget (pulsating, fading colors and rotating), which the screenshot doesn’t really do justice.

I also fixed other minor and major (like ignoring the ignore list) bugs reported to bugzilla.

Lastly, I replaced the old and ugly icon with a new and snazzy one made by Sune Vuorela (which is based on the harddrive icon made by Pinheiro):

hi64-app-filelight

So look forward to the next release of Filelight in KDE 4.11!

IRC commit notifications

When CIA.vc died KDE lost its fancy IRC notifications for commits, which kind of sucked. I initially hacked up a quick and ugly solution using rbot, exim and a really quick and dirty python script to replace it, that just posted all commits to the #kde-commits IRC channel. But now, thanks to the magnificent fellow Ben Cooksley of the KDE sysadmin team, we now have a proper solution, based on irker and some custom magic whipped up by Ben! Together with Ohloh, I believe we now have what we lost with CIA.vc, and more!

So now you will hopefully see a IRC bot named pursuivant-### (where ### is a semi-random string of numbers) pop up in a KDE IRC channel near you, spewing commit messages like there was no tomorrow. To get the bot to join your IRC channel, or to get it to notify about a particular repository, either ask the sysadmin team (for example in the #kde-sysadmin channel) or edit the rules file in git yourself. The entire solution is available in kde:sysadmin/irc-notifications, the rules live in gateway/notifications.cfg. It supports full PCRE regular expressions.

The cancer that is killing free software

From this: http://debarshiray.wordpress.com/2012/10/06/goa-why-it-is-the-way-it-is/

“However, it is a bit more complicated than that. MeeGo (and now Ubuntu’s) SSO has a DBus daemon written using Qt. We have been down this road before, most recently with Maliit. The daemon would have had to be rewritten without Qt using GObject for it to be a part of GNOME.”

Honestly, do I need to write anything? No reasoning or explanations, just knee-jerk “it’s written in Qt, we can’t touch this”. When did this turn into a political battle? We shouldn’t working against eachother here, if someone has great technology we can use (as long as it is freely available and licensed properly, works well, etc.).

I think this is one of the reasons KDE has thrived as well as it has, even without the commercial backing that Gnome has enjoyed; less politics and more fun. I don’t think I’ve ever seen a piece of software getting denied inclusion in KDE because it had non-kosher dependencies.

The comments also have some gold, though, like this:

“But whether that means GNOME should accept Qt is not a decision that is for me or a single individual to make.”

So Gnome is now a corporate, hierarchical software organization, where individual developers have no say? What happened to meritocracies? And where the flying fuck comes thoughts like the need to “accept Qt” from?

Honestly, I like Gnome (though I haven’t used it as my main desktop in some years), I like the Gnome developers, and I think they make a lot of great technology which helps free software in general (and KDE in particular). That’s why it boggles my mind that they sometimes come with these outbursts of irrationality.

/rant

Why am I a freetard?

picture of two freetard trollcats — Freetard trollcats.

After having to fix the newly installed Ubuntu installation on a laptop of a friend today (she was missing drivers for her integrated Intel graphics chip, no idea how Ubuntu managed to screw that up), I thought silently to myself: “why the fuck?”. Why the fuck do I continue to use and support this broken crap? But then I ignored those thoughts while I was trying to figure out why KLauncher was hanging, Apache was whining about NameVirtualHost and VirtualHost mismatches, and other fun stuff.

Then later, while I was waiting for my dinner to finish, I was browsing reddit.com, and saw this post near the top of the front page. In case you’re not in the mood to read some random rant, it’s basically a list of pretty serious bugs and annoyances in the software that the author uses every day. The TL;DR from it is probably this quote: “I work for Microsoft, have my personal life in Google, use Apple devices to access it and it all sucks.”

And then a thought crystallized for me. I have never been able to exactly pin-point why I keep coming back to the cozy mess that is the free software ecosystem. What happened is that I caught myself thinking “why don’t he just sit down and fix it?”. I’m so used to just launching gdb when something hangs, peaking through the source code when I come across unexpected behaviour, or crawl through a stack trace when something crashes. For all its annoying quirks and bugs, when I have the source code I can fix everything from minor annoyances like highlights not disappearing, to crashes when I log in. And not only can I look into how it works, if I can’t figure out the problem myself I can jump on IRC with or fire off a mail to whomever is responsible for the application in question.

That thought, “I can always fix it”, converged with what he wrote; “it all sucks”. The grand point is that while no software is perfect, there are bugs and problems in everything, in every layer of the stack, with free software I am empowered and encouraged to improve it myself. Either directly or simply by just supporting whomever fixes it for me.

Another, separate, issue from that blog post comes from this quote: “Alone or in a crowd, no one cares.” With free software you (almost) always have a community around the software in question. Even if the original author doesn’t respond it’s usually pretty easy to find someone else who cares about the software.

I’ll just round this off with a quote from a friend and fellow KDE hacker, Eike Hein, which I think captures the very essence of this blog post: “I like using open source because I like having the assurance that I can always fix it if I just learn enough, and that there’s nothing blocking my path toward learning it other than time constraints”.

Soundtrack: Miss Murder’s Personal Jesus

Phonon-VLC 0.6 released

We have done it again; a new Phonon-VLC release is ready with plenty of rewrites, bug fixes and other stuff.

An overview of what has been done:

OpenGL surface painting now works properly (used in for example Gwenview)
Fixed the hue adjustment
Fixes to errors found by the static code analysis tool Krazy
Fixes for compatibility with newer VLC versions
Blu-Ray™ support
QML support
Lots of fixes for crashes, memory leaks, etc.

As usual it is available through the KDE infrastructure:

http://download.kde.org/stable/phonon/phonon-backend-vlc/0.6.0/src/phonon-backend-vlc-0.6.0.tar.xz

From the KDE Multimedia team; enjoy!

KFileMon

A picture of a car wreck — A perfectly fine and stable car.

So, after finally getting around to fixing that annoying Plasma Task Manager bug (and I must say, I can’t wait until that applet is ported to to QML, the animation/state change code in there is a mess), I noticed a discussion on IRC about file notification APIs and how there seems to be no good in-kernel solutions at the moment. And after reading Vishesh’ blog post I got an idea for how to maybe solve it using LD_PRELOAD.

The idea is that we have a itsy bitsy library that we inject into every process, by setting LD_PRELOAD in the startkde script. This way the dynamic linker uses our custom functions instead of the normal ones. So when we LD_PRELOAD in a library with a custom rename() function (the standard C library rename() function is used for moving files), it calls our library instead of glibc. Our library in turn tries to connect over a local Unix socket to a central process, and reporting in the old and new filename/path, as well as the current working directory (in case we are working with relative paths). Finally we call the normal glibc rename() function, which does what it does best, actually move the file.

So, I thought why not, and quickly wrote a quick and dirty proof of concept. It is available in git: http://quickgit.kde.org/index.php?p=scratch%2Fsandsmark%2Fkfilemon.git&a=summary

While it might initially seem like a fragile solution, I think it should be pretty reliable in practice (funny pictures aside). It might be crazy enough to actually work. The only thing that won’t work is if applications launched outside of a KDE session move files around, but they have no business moving files around in users home folders anyways (and a solution to this is to set LD_PRELOAD even earlier, for example in /etc/profile.d/ somewhere).

So I thought I’d blog about it, so I can get some useful criticism and learn why it shouldn’t be done this way.

Reddit notifier for Plasma

Snoo, the Reddit mascot.

So, scratching an itch some time ago (July 12., according to my timestamp), I went ahead and wrote a notifier for Reddit based on the simple but awesome Facebook notifier. Then it hit me that maybe there were other Reddit and KDE users that would appreciate this, so I uploaded it to my server.

Download

You can download it from here: http://mts.ms/redditnotifier.plasmoid

To use it

First add it to the panel or wherever you want it.
Right click on it and choose “Reddit Notifier Settings”.
Go here and get the URL for the feed you’d like to monitor (I use the “everything” feed), and paste it into the appropriate field.

UPDATE

I put up the source on github, and uploaded it to kde-apps.org so that it can be easily installed directly from Plasma/GHNS.

sandsmark's blog

yay me