Saturday, May 16, 2009

RankBoost and DCG

So I've finally sucked (my best DCG on test set is 4.198 which is kind of not too good) in the Yandex contest I've already posted about. As I think, there were two major problems: lack of time and, unfortunately, lack of good ideas. But I want to share with you some things I've learned while participating in it.

One of the ranking algorithms I've tried was RankBoost with binary rankers originally proposed by Freund and Schapire. To have ability to separate not only documents with high value of some feature from documents with low value of the same feature, but also, for example, documents with feature value distributed somewhere around 0.5 from any other, I've performed additional experiments using ranking features that are functions of another features. For that purpose I've selected truncated gaussian with mean=0.5 and also [0,1]-multimodal sinus-based function:
There are plots representing experiment results in terms of RankBoost performance value and Yandex DCG:

As for me, I've drawn 3 conslusions:
  1. Using only original features for creating weak ranker sucks.
  2. Using weak rankers based on functions of features is slightly better.
  3. The whole approach still sucks in terms of DCG.
I should have tried RankBoost with concave learners proposed there. Oh, forget to mention. Yandex DCG for query can be calculated like that:

Final DCG value is then acquired by calculating sum of DCGs of all the queries and then dividing it to the number of queries.

2 comments:

  1. Hello,
    I do not understand your use of features transformations. With negative alpha's, the boosting process of rankboost is able, if necessary, to approximate any function of the provided features.

    Tanguy

    ReplyDelete
  2. Alex Gorodilov (from Yandex) who gained 1st (?) unofficial place in the rating used GBM package for R. Nothing to invent =)

    Learning to rank is a mature branch of ML and it isn't a productive idea just take and try some general ML techniques - they have their own state-of-arts.

    ReplyDelete