A last waltz with OldMonk

December 25, 2012

Some people are just larger than life, and sometimes “larger” in multiple senses of the word. Raj “OldMonk” Mathur has left all of us who loved him unutterably bereaved. However, he would think it remiss of us if we did not use this opportunity to engage in the puns and black humour that he loved. Here is to you, Raju! A small paean of affection, friendship, love, and respect, but also an irreverent look at what I knew of you. I believe that you would want nothing less.

So many memories come flooding back from the few days that I knew him, but one among them is when someone brought up that formulaic self-help book: “Who will cry when you die”. Raj’s immediate answer was “I hope nobody! Why would I want to make anyone sad?”. There is a depth in that apparently-flippant remark that characterised many of Raj’s words. So, to take his own words at their value, we should choose to laugh at the joy that he did bring us, rather than mourn the immediate sorrow of his passing. Lifting him on to the pyre on his last voyage was one of the toughest loads that I have ever shouldered, but at the same time I could not help but grin at the sudden madcap thought that were Raju able to speak he would probably have chimed in with “अबे ! गिरा मत देना !”.

Where does one start remembering a wizard of his times? Many of us have already paid homage to his technical prowess, but he went much beyond that in his uncompromising attitude towards promoting the free and open sharing of knowledge and ideas. This was not something that he just espoused, but what he lived and breathed. I have never heard him waver from his fundamental stance that not sharing knowledge was unethical. We stand not just on the shoulders of such giants, but on their very flesh and bones.

Personal anecdotes could fill entire volumes about his basic orneriness^W kindness. While he could be a real pain in the posterior at times, there are very few people in this world that I came to trust more than him. Trips to Manali will now forever need a seat to be left empty, just in case Raj decides to suddenly show up, complete with leather jacket, cigarette, and laptop loaded with an eclectic music collection. Whenever we next have a ILUG-D party, I will need to remember to cancel that extra case of soda, and honey will gingerly need to get along without lemon teas. Who will now care enough to feed us babushkas now that OldMonk is no longer with us?

So long, Raju. Mine eyes seem to well up in spite of the fact that I can almost hear you laugh at what you would probably term as maudlin sentimentality. If I weep, it is just at the sheer injustice of a world that does not seem to allow one to still be “Permanently out to lunch” from the hereafter. Not all the OldMonk in the world can fill the void that you left in our lives.

Why I will be at FOSS.in/2008
There has been much verbiage expounded at the new directions taken by foss.in this year. While, in my opinion, there is much to be alarmed at in the direction that foss.in is seemingly heading in, I cannot but help feel that some of the most vehement opponents already had their axes sharpened. One can almost see the spittle flying over the remoteness of a glowing screen that brings the words of others to one.

From a personal perspective, I am tied foremost to community-driven events, and organisations, namely Freed.in, the Indian Linux Users’ Group, Delhi (ILUG-D), and Sarai, in no particular order. I have been a participant in FOSS.IN (or whatever name it passed under then), since 2005, and actively helped out at the event last year, but I do not have a dog in this particular fight. My goal is to build bridges, and not cultivate enemies. To quote the incomparable Emily Dickinson:

I had no time to hate, because
The grave would hinder me,
And life was not so ample I
Could finish enmity.
...

I must also acknowledge the fact that some of the people most actively involved in foss.in are either also part of Freed.in or have provided valuable input to the event. At the same time, this is not a weak-minded acquiescence with all aspects of foss.in. I would love to see the resurgence of a Bangalore Linux Users’ Group, and a community-driven event in Bangalore. It is also up to us to prove that a community-driven event in India can be as successful as foss.in. This means you. At least to my mind, this is possible—Freed.in being what I am putting my energies into—but it certainly remains to be proven. Unless one is expending effort in this regard, in my opinion, blatherings about foss.in 2008 seem to fall into the category of the Arabian saying: The dogs bark, but the caravan passes by.

So, philosophical musings aside (as the rallying cry goes, show me the code), the reason that I will be at foss.in 2008, and am pushing people in IndLinux to be there, is with the idea that let us set aside what we think is the right way to proceed, and engage with foss.in on their terms. IndLinux can have other meetings, on our own terms, and we will indeed do so in the near future, but at least for now, we will also be at the largest, and most successful FOSS event in India, personal misgivings firmly set aside for the moment. We are there with a serious purpose, and will address serious issues of software related to Indic computing, including:

  • Indic sorting
  • Spell-checking enhancements for Indian languages
  • Machine translation
  • Optical character recognition (OCR)

Bengaluru, or bust!

Subtitled: Why I will never take up an academic position in Indian science

This is a difficult piece for me to write, as it requires me to try and be objective about making the decision to walk away from a large piece of my career, and life. I hope to do a series of posts giving a personal description of why I chose to abandon a career in science, and also an analysis of what ails experimental science in India, and some suggestions on how this might be fixed. If the overall tone comes across as angry, it is because I am angry. Angry at the wasted talent, the wasted lives. Angry for what could have been.

The normal caveats apply. If it helps you sleep any better, you are most welcome to apply the usual dismissals. The common way to wave away such comments is to say that the person in question just could not hack it. Who knows, it might even be true. Google up my name (Gora Mohanty), and the subject of gamma ray astronomy, talk to any of my associates, and draw your own conclusions. I offer absolutely no apology for claiming that I was a good, if maybe not great, experimental physicist. I am also strongly of the opinion that information technology is the area where it is currently the easiest to do world-class work in India. This is, of course, a very broad area, and the term could also reasonably be used for many areas of scientific research.

I will also note that none of this should be construed a priori as a slam on people I have worked with. I know many excellent Indian physicists who have chosen to remain in the system, and try to change it from within. To each his cup of poison.

To my mind, there are two main reasons behind the failure of experimental science in India, and funnily enough, neither of them has to do with a lack of money. These are: (a) An abject failure of the educational system to inculcate the scientific method in students, and (b) The ad-hoc separation of research, and teaching institutes.

Why Bitu cannot do science

Like most other subjects, science is taught at the primary levels, during the formative years, by rote memorisation. We claim to take pride in the fact that most of our students take quote science unquote courses, and, yes, it is also true that the average fifth standard student in India can parrot to you Newton’s three laws down to the last comma in the textbook. Do an experiment yourself, after they quote you Newton’s law of gravitation, ask them why does an apple fall down to the Earth, and not the Earth up to the apple. My favourite answer to this, mind you, from someone who consistently stands first or second in their primary-level classes, is that gravitation is a property of heavenly bodies. Now, where else could they have learnt this wonderful excuse, except from having had it forced down their throat. And, in case you think that you have not lost out from having been brought up in the same system I offer you a simple challenge: Explain to a lay person in 15-20 minutes, without using any math, why the second law of thermodynamics should intuitively be true.

The problem is that the educational system in India is largely in chaos. People point to the IITs/IIMs as some kind of emblem of India’s brain power, but Prof. C. N. Rao’s comment about how the average mid-level university in the USA does better research work than any IIT was, if anything, kind to the IITs. Even taking the IITs, and IIMs at their own recognition, they cater only to a miniscule fraction of the students in India. Besides, by the time students reach that level of higher education, it is already too late. This point was driven home to me as a graduate student in the USA. The average incoming Indian student is definitely much better equipped than his American counterpart in terms of theory and the mathematical tools of the trade, but the outstanding American students, typically people who have learnt physics by tinkering with things, are on a different plane. Richard Feynman, for example, was a uniquely American genius. This is not to put down the outstanding Indian scientists who have succeeded in spite of the system, but how long can we survive on the basis of 5-sigma events?

What is scary is that I do not see any great eagerness on the part of the educational establishment to bring about the sweeping reforms that would be needed. Though things are definitely changing at the top, it is very much the status quo in most parts of the country. For the most part, the only people that I see who are really anguished about the endemic failures in the system are people who do grassroots science teaching, and popularisation. I recounted my pet story about Newton’s law of gravitation to a senior member of the Orissa physical society (who shall go unnamed), only to be waved away with: “Oh, no, no! This might have been the situation in your days, but not now. Besides, it might be the case for ICSE, but certainly not for our CBSE students”. I suppose that there are none so blind as those who refuse to see. Anybody who is at all interested in the reform of science education in India should look into something called the Hoshangabad Science Teaching Project, and Ekalavya, and the reasons behind its failure. This was India’s chance at a reform on the scale of what happened for US science education after the Sputnik scare, and we missed our chance at it. No matter, I suppose. There will be another astrologer, another charlatan, another godman, another cricket tamasha, another Bollywood item number which will ease our sorrows.

Tied to this is the attitude of all too many professors for whom their students are to be looked down upon, and treated as something just a shade above domestic help. Spare me your outrage about how you do not do this. That could well be true, and if so, more power to you, but you only have to take a look around yourself. I will believe that things have changed when undergraduates in the average Indian institution do not feel compelled to address random visitors like me, as “sir”.

Finally, there is the race to the bottom these days in the various entrance exams, something which is actively encouraged by parents. I guess that Taare Zameen Pe was a nice picture, but when it comes to my kid, he (most definitely he, and not she: Uske liye to wheatish complexion ka ad aa jayega akhbaron mein) will be the IIT topper. Kids nowadays do not have a normal life outside of preparing for exams. There are a myriad institutions now in India whose main focus is on preparing students for the IIT-JEE. Oh, and by the way, you also get a +2 degree along the way. If this is what things have come to, I say shut down the IITs: They are doing more harm than good.

Let a thousand flowers bloom

There is no doubt that there are world-class scientific institutions in India; the Tata Institute of Fundamental Research (TIFR) being usually the first thing that comes to most people’s minds. As opposed to that, most universities outside maybe of the main metropolitan areas, are starved for funding, facilities, quality teachers, and any possibility of being involved in research as a student. This is a very dangerous situation, as a constant inflow of fresh blood in the form of brash new students eager to take on the world is what a research system needs. The artificial divorce between research and teaching institutions hurts both, by reinforcing the horribly pejorative impression that those who can’t, teach, and by ensuring that research institutions are perpetually starved of competent manpower. After all, if people are not shown how to do research, the best that one can hope from them is that they will look up experiments in textbooks and journals, and faithfully copy them. And, yes, we will claim that it is ground-breaking because it is the first time that this experiment was done in India.

More to follow. Bouquets, and brickbats are invited. If this article does not have you seething, one way or the other, it is because you did not read it carefully enough.

Freed.in. Quo vadis?

September 2, 2007

Medium Banner - 468x60

The annual event of ILUG-Delhi has recently changed names from Freedel, to Freed, with the goal of refocusing the event from catering just to Delhi, and to a certain in-crowd of free/open source software (FOSS). The change of name was prompted by our public relations consultant (a first for us!), Rajesh Lalwani of BlogWorks who volunteered his time to help us out. The major change of perspective is that we no longer see ourselves as restricted to Delhi, or even to India, or to any particular ghetto that people may want to restrict us to.

Since there have been a lot of words bantered over the change of name, I felt that it was important to jot down a few thoughts on this issue, much as I feel that personal blogs are, by and large, a means of self-aggrandisement.

First and foremost, the change of focus for the event is a logical outcome of internal changes within the Indian Linux Users Group, Delhi (ILUG-Delhi), currently the main group organising the event . We had long, often heated, discussions about where we wanted ILUG-Delhi, and the annual event to be, and came out of those with a stronger sense of purpose. Mainly, we have felt it necessary to become more open, and more inclusive than we have been in the past.

So, here is a mini-FAQ about Freed.in:

  • Why the change of name?
    The obvious reason is to not have the name be entirely specific to Delhi. We are trying to broaden our horizons, so that the event is not just about ILUG-Delhi, but about freedom in technology, and software. Get Freed, it fits.
  • Are you trying to take over the world? Or, India?
    Yes, but not so that you would notice. Seriously, though, we have no aim of trying to “take over” any organisation without their express wish to get involved. While we do invite anyone to take part, it can be entirely on your terms, and your willingness to be a part of the event.
  • Is Freed in competition with Linux-Asia? FOSS.in?
    Of course not. We have a very definite focus to our event, and while other events might be doing wonderful things, the three things that we are most interested in are Freed, then Freed, and lastly again, Freed. We will join hands with anybody who shares our goals, no matter what past history has been.
  • OK, then why yet another FOSS event in India?
    To the best of my, admittedly meagre, knowledge, we are the premier event in India that is of the FOSS community, by the FOSS community, and for the FOSS community. While the organisation of this year’s event is still somewhat chaotic, we have taken a big step forward, and I look forward to more of an active participation from all FOSS groups in future events. So, if you believe that FOSS should be done from grassroot efforts, please come work with us, no matter where you live. I cannot stress it strongly enough that all successes and failures of the event are those of the community, and not of individuals, something that has been reinforced for me personally over the past few months.
    Send not to ask for whom the bell tolls,
    It tolls for thee.
  • How do I participate?
    If you want to speak at Freed 2007, please register at http://conf.freed.in
    If you want to volunteer to help out, please send mail to volunteers@freed.in

Introduction

Spell checking software has always had the aura of black magic, which is probably why another meaning of spell is the enchantment cast by a magician. The best-known, and most widely used open-source spell-checking engine is aspell, and thanks to the needs of some folk at an Indian language search site, we had a chance to delve into the guts of aspell, in order to customise it for Hindi.

Relevant aspell features

aspell has many useful features that made it well-suited to our task:

  • Unicode (UTF-8) support: Via translation to an internal 8-bit format.
  • Support for phonetic features: Invaluable for Indian languages, which are spelt phonetically.
  • Other features: Allow tailoring to new languages. Includes items like affix rules, replacement tables, keyboard layout specifications, etc.
  • Good documentation: Was important in trying to figure out the internal details of aspell.

Customisation for Hindi.

It turns out that no immediate changes were required to the aspell code. Instead, it was sufficient to customise various aspell features to Hindi. The chief of these were:

  • Phonetic rules: There are various aspell options for including phonetic information about a language, ranging from simple methods which do not work as well, to complex techniques which take a fair amount of time, but with corresponding returns in terms of the quality of the spell-checking. As phonetic information is expected to be of importance for Indian languages, we use the comprehensive, table-driven phonetic code mechanism. The rules allow specifying, for example, that क sounds like ख, क like क्, कि like की, कु like कू, etc. This turns out to be the single most important factor for improving the performance for Hindi.
  • Optimisation of internal settings: Various aspell quantities, such as the sounds-like weight, the costs for edit-distance calculations, etc., were presumably optimised for English, and we did an extensive study to re-optimise them for Hindi.
  • Affix rules: aspell allows affixes (prefixes, and suffixes) to be automatically applied to words. Thus, for example, storing नदी in the dictionary, also covers the plural नदियाँ, if the appropriate affix rule ( ी—> ि_याँ ) is included.
  • Replacement tables: Besides phonetic rules, a one-to-one replacement table can also be used to handle common mis-spellings.
  • Keyboard layout specifications: A class of mis-spelled words arise from typographical errors made due to the proximity of keys on a keyboard, such as “scsn’’ instead of “scan’’. These errors can be given priority if the layout of the actual keyboard in use is provided to aspell.
  • Run-together words: It is possible to check for words that have been accidentally run together, such as “catbird’’ in place of “cat bird’’.

Performance summary

Based on the above work, the Hindi spell-checker now performs on par or better than the English equivalent, which is quite remarkable, considering that the original developer has no knowledge of Indian languages. A performance metric for a spell-checker could be to take a sample list of mis-spellings, feed them to the spell-checker, and check:

  • Whether the known correct word is suggested?
  • If so, what is its position in the replacement list?

The table below shows such a comparison for Hindi against the default English engine, and against the best (but, much slower) English engine (CAVEAT: Other factors, such as the extent and quality of the dictionary, and the comprehensiveness of the sample word list factor into this, and too much should not be read into the actual numbers.)

Category Hindi Default Eng. Best Eng.
Not found 5% 6% 2%
1 71% 59% 60%
1-5 91% 86% 83%
1-10 94% 91% 90%
Any 95% 94% 98%

Other work

Various other tasks were taken up as part of this project, all of which have been, or will be released as open-source:

  • Patches for aspell have been submitted that provide hooks to internal quantities to allow for tuning the performance to a new language.
  • aspell has bindings only in C. With the use of SWIG, we have put together bindings in a variety of programming languages. The bindings operate firstly as low-level wrappers around the C functions. More natural, class-based interfaces are then built around this low-level code in programming languages that provide support for classes. This is currently available for Python, Perl, and C#, and will be released soon after some clean-up. A separate write-up on this work is also being prepared.
  • Testing framework: Based on the SWIG bindings for C#, we built a GUI testing framework that allows easy access to aspell internals. This was done in Mono.NET, and works cross-platform across Linux, and Microsoft Windows (can also be used under Mac OSX, Solaris, etc.) using the Mono runtime. A separate write-up on this work is being prepared.
  • OpenOffice aspell plugin: The OpenOffice office suite currently uses Hunspell as its default spell-checker, but an aspell plugin would be very useful. This is being worked on.
  • A comprehensive Hindi dictionary, including affix rules, is under preparation.
  • We plan to apply the knowledge gleaned from the Hindi work to other Indian languages. In particular, we have started working with Prof. G.S. Lehal of Punjab University on Punjabi.
  • While aspell functions pretty well for Hindi, a morphological spell-checker that can use contextual information (e.g., the gender of the noun can be used to narrow down the modified spelling of the associated verb), will also be of value. An engine like this could also become a more general-purpose grammar analyser.