I have recently come upon this link through one of the people in Twitter - http://translationparty.com/#10071202 . The page takes a sentence in English, translates it into japanese and re-translates back into english till two successive translations give the same sentence. In other words, you can say that this goes on till a ‘translation equilibrium’ is reached. Please try the link and have a look at the results of the sentences you get. It surely is hilarious but as I played around an interesting coincidence struck me.
The game of Chinese whispers we play in gatherings has a close correlation maybe… So for the uninitiated, Chinese whispers is a game wherein you whisper a sentence into your neighbor’s ear who in turn whispers to his neighbor… as it completes one full round the original sentence is compared with the version that traveled through people’s ears. Invariably, it happens that the original sentence is so distorted that it always results in hilarity. So what I did was took a sentence from the Chinese whispers game(of course through googling! ) and sent it through the translation party. “Goofy Grinning Gophers gobbled gigantic grapes while juggling” was the sentence I sent across and this is the result - http://translationparty.com/#10079415
- Goofy Grinning Gophers gobbled gigantic grapes while juggling
- Gophers while juggling a goofy grin and gobbled a giant grape
- Meanwhile, juggling a goofy Gophers grinning and laugh and gobbled’s giant grape
- On the other hand, grinning, goofy Gophers and juggling huge grapes of laughter and gobbled
- On the other hand, grinning, drinking giant grape in the goofy Gophers laughter Maru and juggling
- Grin and drink giant grape in the goofy Gophers laughter Maru and juggling, meanwhile,
- The goofy Gophers laughter circled in giant grape and juggling, while drinking, and grin
- Goofy Gophers laughter giant grapes, juggling, drinking a smile and while Maru
- Juggling, smile, goofy Gophers laughter giant grape drink - equilibrium…
It would be really interesting to compare a transcript of a Chinese whispers game in progress with the translationparty.com output. Both have the noisy channel assumption, but do they really correlate if at all they do? Can they be really compared or is it like comparing apples with oranges? Will a cognitive study of the game of chinese whispers lead to better translation techniques? These thoughts may be outright crazy offshoots of a tired brain’s attempt to hallucinate research ideas. But hopefully this craziness really has a spark that can ignite some innovation in the future.
A few days back I was reading an interesting paper by Pat Langley, “The changing science of Machine Learning”. It was a very interesting look at the humble beginnings of machine learning and where it has reached now and what has been missed. Some of the cobwebs in my mind have been cleared, thanks to him. I really agree with him when he says that machine learning has got more into the data analysis mumbo-jumbo ignoring the challenges of reasoning and problem-solving. Another point he mentions is of the fact that only papers with mathematical formalization are thought to carry weight in a premier machine learning conference. I always thought that I might have to change lanes at some point in time w.r.t my research which I shall explain shortly.
My work is related to integrating knowledge sources to perform different tasks, and devise an intelligent mechanism for doing the integration instead of an ad-hoc approach. The problem may be unclear but that is not the point I am trying to make. I was always under an impression that this problem does not have the mathematical rigor, and I may have to switch into something more mathematically challenging. My pitfall was that I mixed up ‘mathematically challenging’ and ‘challenging problems’ and almost forgot that challenging problems are a superset of mathematically challenging problems.
The other point I wish to make is the interesting cycle of ‘history repeats itself’ albeit in a different way. In the 1980′s the emphasis is on knowledge-based approaches which paved the way for statistics based approaches in the 90′s in the case of machine learning. The main reason for this is the bottleneck in acquiring knowledge sources. Now, especially in the case of text-related machine learning tasks, knowledge-based approaches have again come to the fore in the late 2000′s thanks to the seminal work done by Evgeniy Gabrilovich and others in using the wikipedia as a background knowledge resource.
So my observation is that irrespective of the path we take to solve a problem, we might think that we change lanes in the short term, but actually we may be converging towards something more general or unified. I talk in terms of text processing but this may be applicable to many other areas. Solving a text processing problem would make us switch lanes by trying out different methods in machine learning, natural language processing or some further specialized methods like case based reasoning, the final goal is to build an intelligent system chasing the elusive goal of ‘artificial intelligence’. We may change the lanes but the paths eventually converge.
Let me start off with a quote to set the tone for the rest of the article and also explain the title.
“Do not use statistics the way, a drunkard uses a streetlight- for support rather than for illumination”.
There has been quite a disturbing trend among researchers, who use a suite of statistical tools to cover up lacunae in his results/experimentation. The basic purpose of statistics is to provide some insight into regularities that exist in data and test the efficiency of the model. But unfortunately most people use it to cover their inefficiencies with statistical jargon. The rest of the article will be to give an intuitive understanding into statistics and probability keeping mathematical rigor at bay.
No description of statistics is incomplete without probability. First of all, what is probability? Maybe it is a means to measure uncertainty. Ok, then what is uncertainty. I am not so certain myself. But here is what I think. There are mainly 2 types of uncertainty:
- Uncertainty in Time
- Uncertainty in Measurement.
Uncertainty w.r.t Time means that at this instant we are not able to judge the outcome of an event but given time we can have a clear idea. In the famous tossing of coin experiment all you have to do is wait till the experiment is completed to get the result of an outcome you are waiting for. This sort of study is probability theory. What about uncertainty in measurement? Consider that you have just received a greenish-yellow T-shirt. Even if you stare at it for an infinite amount of time, you will never be able to tell if it is completely green or yellow. The uncertainty is not with time but with the imprecise measuring instrument. Study of such uncertainty comes under the auspices of possibility theory also called fuzzy theory. But there is still a raging debate in the community, whether we need two different schools of thought to study uncertainty. Once we have an idea what uncertainty pertains to our experiments, we can do the relevant analysis.
II. SCHOOLS OF STATISTICS
What is probability? Before we go into the difference schools of thought let us have a relook into what probability means. Even till date, there has been no consensus on what the right definition of probability is. Is it a subjective degree of belief or an objective long-run frequency? Consider the coin tossing experiment – an unbiased coin tossed 1000 times. The subjective belief tells you that the probability of heads is 0.5, but at the end of 1000 tosses if there are 600 heads and 400 tails, will the probability change based on the objective frequency? This is something to ponder about.
Bayesian Statistics Bayesian statistics is the oldest school of statistics founded by Rev. Thomas Bayes with the Bayes theorem. Without going into the math, let us look into the intuitive definition of Bayesian statistics. It is all about taking the results of an experiment which is the likelihood, and combining it with personal knowledge about a phenomenon called prior, and then obtain the result called posterior probability. Prior is the value where we can inject the experience or previous belief about the experiment. So this means, for different priors, we get different probabilities in the case of Bayesian. Another way is to generate the prior from the data itself where the subjectivity disappears. Again, the problem of subjective vs objective comes into play. Though earlier, Bayesian was kept aside, because analysis was very difficult with hand, it has again found new interest with the advent of computers. Moreover, interest in finding patterns in large data ala Machine Learning/ Data Mining has lead to resurgence and new avenues for application.
Frequentist The statistics that is generally taught in college level is the frequentist statistics. This very briefly involves proposing a null hypothesis (H0), saying there is no relationship or effect and an alternative hypothesis (H1) which is opposite to that. Next, an alpha level or threshold is fixed and a statistical test is performed. A p-value is calculated and if the p-value is smaller than the alpha level then the results are said to be “statistically significant”. The obvious problem with this is that the prior information is not taken into account. It only tells the probability of the data given H0 or H1, while the interesting thing to observe is probability of H0 or H1 given the data. So in frequentist statistics, the model remains the same while the data keeps changing, and in Bayesian, the model keeps changing while data remains the same. There is also an information theoretic look at statistics based on ideas from thermodynamics. But I am refraining from going into that for the sake of brevity.
So the concluding note is this. Know when statistics need to be applied and where. Remember these are tools to gain new insight rather than support your results. Therefore, it is important to know the fundamentals carefully so that when a majority of researchers apply them in the right way – the idiom “Lies. Damn Lies. Statistics” can be changed into something more respectful that the field deserves.
Firstly apologies for the absolute laziness and inertia on my part to not update this blog at all. After starting this blog I was totally bogged down by the fact that there are so many good technical blogs on AI and NLP so what must I write on. Slowly and lately I have come to realize that keeping a research blog is like keeping a research and learning log. Keep a log on things that have excited you and you want the world to know how they excited you. So here I begin with a small article on our winning hack in the science category at the university HackU! event conducted by Yahoo! R&D at IIT Madras.
The name of our hack was Tweets of Interest. The motivation is simple. We follow many users on Twitter because their interests match ours. But it does not mean that they always tweet interesting things. If you follow a researcher on Twitter, a tweet about his recent paper interests you more than his tweet on an adventure he/she recently undertook. Won’t it be nice if there was an application that polls our timeline taking our interests and mails ‘interesting’ tweets to us.
The app has a gui that takes your user id on Twitter, and words separated by commas that specify your interests. I won’t go much into the implementation details because it is pretty much straightforward. It was in Python, (and I fell in love with it:) ), and the other components like mail server, extracting the tweets using the twitter api, the oauth authentication and so on are all solved problems with sufficient documentation available. Since it was a 24-hr implementation you can imagine the dirtiest programming practices that came to the fore to deliver the app in the quickest time. Instead let me delve on the science behind this hack (and before I forget let me mention that it was in the science category).
Instead of a naive keyword based approach involving searching hash tags or complete matching of words, we tried to use a more semantic approach. We tried to LSA or the Latent Semantic Analysis. This is a well-known algo in the field of IR, and so I won’t delve on it either(and yes I am not an escapist). So now each tweet is a document and all the words in the tweet form the term-document matrix. For each word in the tweet, we augment it by adding the synonyms/synsets with the help of wordnet to get an expanded term set. From the term-document matrix, we apply LSA to get the concept space and the tweets are now projected onto this space, along with the query which is nothing but the interests that the user specifies. A simple cosine similarity will enable a ranking of the tweets and the top tweets are mailed and waiting for you in your inbox :). The approach was quite good as a query with the interests given as google,papers gave tweets that talked about search.
But there is still a lot of work yet to be done… for this approach may not work for real world tweets that talk about Named Entities… For example a tweet talking on ECIR will be completely missed out for the app has no idea what ECIR is. So adding world information may lead to a better system, and it is a work in progress.
I hope my first word would not be the last and I continue to keep posting about my thoughts on research… Most of it might be you may crazy ideas that may come in the odd circumstances and completely unfeasible. But I believe the process is more important than the product. The process of reaching the idea – the introspection, the analysis, the intuition, and of course the creativity is much more important than the idea itself.
Research has mainly 5 steps - Observe. Interpret. Infer. Conclude and Recommend. So usually, for any hypothesis, we look at an observation and see if the hypothesis holds. Sometimes the background knowledge along with hypothesis explains an observation. But there are times when we come across an observation that cannot be explained by the hypothesis and background knowledge both. So the background knowledge may need to be changed to support the hypothesis and explain the observation. So the ingenuity lies in looking at that particular observation that violates the existing hypothesis. Hence a researcher needs to be alert for that observation that starts the chain of events that lead to his thesis/paper.