ECIR 2016 Paper and Presentation

I recently presented a paper entitled Efficient AUC Optimization for Information Ranking Applications at the European Conference for Information Retrieval (ECIR) in Padua, Italy.

In the paper, we derive gradient approximations for optimizing area under an ROC curve (AUC) and multi-class AUC using the LambdaMART algorithm. Here are slides for the talk, which give an overview of the paper and a brief review of Learning to Rank and LambdaMART:

The goal was to expand LambdaMART to optimize AUC in order to add to the portfolio of metrics that the algorithm is currently used for. The approach was to derive a “λ-gradient” analogous to those that have been defined for other metrics such as NDCG and Mean Average Precision.

This in turn required an efficient way of computing a key quantity, namely the change in the AUC from swapping two items in the ranked list. One contribution of the paper is a simple, efficient formula for computing this quantity (along with a proof), which appears in the paper as:

delta_auc_theorem

The paper also contains experimental results measuring the performance of LambdaMART using the derived gradients.

Coltrane, a jazzy Haskell web framework

I’m excited to open-source Coltrane, a minimal web framework for Haskell. Coltrane is inspired by Ruby’s Sinatra framework, and is named after the legendary jazz saxophonist John Coltrane.

The framework lets you create simple Haskell web applications with just a few lines of code. Yuanfeng Peng and I made Coltrane as a final class project, and now it’ll live in the open on GitHub. Coltrane is also now on Hackage, and can be installed using the cabal package manager using the command:

 cabal install coltrane

Check out the GitHub page for more information, documentation, and code examples – and maybe even put on some Coltrane while you’re at it.

Story of a Side Project I : Organizing Running News

This is the first of a series of posts about the evolution of one of my projects, feedlier. feedlier is a simple, flexible news aggregator, which was created out of a need for a clean, organized, and customizable way of reading news headlines. feedlier started out as a site focused on running news, called “RunFeed”, and then was generalized to support any news genre. This post is about the initial motivations for creating the site.

Running News

As a runner, I often find myself browsing the internet for running related news. I like to stay up to date on results, hear about interesting developments in the running world, and generally immerse myself into the world of running. About a year ago, I encountered a few problems and annoyances that motivated me to create a way of organizing running-related news.

Distractions, Bright Colors, and Interesting Geometry

LetsRun.com is one of the most popular sites for staying up to date in the running world; the site’s content is updated daily, draws from a variety of sources, is maintained by people that are extremely passionate about the sport, and has an active community aspect.

One of the highlights of LetsRun is the sheer amount of curated running news that lands on its homepage. For instance, today their front page contains news from IAAF, yahoo sports, a “Week in Review” summary, a running video, a quote of the day, “News from Indiana”, NCAA news, Recommended Reads, “Mo Farah News”, Marathon Results, a message board summary…and that’s just a third of the page! This sounds great – I’ll just head to LetsRun, and surely I’ll be caught up for the next week! Right?

Well, maybe not. LetsRun’s extreme wealth of content is, in my opinion, also its biggest weakness. As I’m trying to read a headline under the “News From Down Under” section, I have a photo gallery flashing to my left, and a running video playing to my right. Just reading one headline requires a laser-like focus with all the distractions in the periphery.

Image

videos, photos, ads, colors, and headlines

And wait, what is “News From Down Under” and why is it even at the top of the page, while “Entertaining / Interesting Reads” is hidden towards the bottom, for every user? I’m also interested in catching up on NCAA XC News, but why do I have to search through a sea of differently-sized boxes, bright blue hyperlinks, yellow backgrounds, and walls of text to find it?

Of course, people consume news differently, and I am sure that many people find the LetsRun format satisfying. This isn’t supposed to be a rant against LetsRun – I firmly believe that the site has a positive impact on the running community, and I have a lot of respect for the amount of work that they do to promote running, to keep the site up to date, and to generally act as an online running ‘hub’.

But I found that when I was on the front page, I was doing optical gymnastics and searching for content more often than I was actually reading headlines. As a result, I would end up in the forums instead of getting caught up on news. Moreover, the site didn’t include niche sources, such as HepsTrack, so seeing the running headlines for the day required navigating through multiple web sites.

RunFeed

So, in an effort to centralize running-related news in an easily digestible form, I came up with the idea for “RunFeed”. In order to solve the frustrations that I had with LetsRun and the fragmentation of running news across multiple sites, I knew that the solution had to fulfill two principles:

  1. Simplicity. Catching up on running news, in my opinion, should be easy. It shouldn’t require navigating through multiple pages, searching through a disorganized site, or maintaining a superhuman focus. It should be obvious where the news is coming from, on a single page, and intuitive to use.
  2. Elegance. The news should be organized and neat. A user shouldn’t be distracted by inconsistent formatting, color schemes, or multiple content types.

I envisioned a site that simply displayed headlines from running sites across the web, in a neatly organized fashion. The various sites would each have a feed of news headlines, which would be consistently organized. I saw a post on Hacker News about a site called Skim Feed, a site that greatly influenced my idea of what a neat and organized running news site would look like. With these ideas in mind, I set out to do the implementation.

Coffee and Coding

The simple, single-page design that I had in mind was a perfect opportunity to use Sinatra, a minimalistic Ruby web framework. Indeed, Sinatra’s minimalism seemed to be a good fit for the minimalism I had in mind for RunFeed. I used the Feedzirra gem for parsing RSS feeds from the various running sites in order to get the news headlines and links. Since LetsRun didn’t have an RSS feed, I also wrote a scraper using Nokogiri. After a few cups of coffee and a weekend of hacking at Lovers and Madmen, the site was up! Here’s a screenshot of the original RunFeed, the site that would eventually turn into feedlier.com:

Image

So that’s the story of how the initial idea of feedlier got started. A few reflections:

1. Solve a specific case first, then generalize later.

By making a site for the specific case of running news, I set down a framework that I could generalize and expand upon. Make sure you can solve one thing well before trying to solve everything. Trying to do to much at once could, ironically, lead to not getting anything out the door at all.

2. Merging interests keeps it interesting.

One thing that fueled me to make this in my free time was the fact that it combined web development and running. The combination of two things I’m passionate about made it exciting and interesting.

3. Invent a very slightly modified wheel.

There were existing sites for viewing running news, and existing feed readers. It would’ve been easy to settle for an existing solution, but ultimately, they didn’t fulfill the exact uses I had in mind. Creating “another” feed reading site turned out to be a good learning experience, and resulted in something that exactly solves a specific need.

4. Make an itch scratcher.

As Linus Torvalds said, “Every good work of software starts by scratching a developer’s personal itch”. The hope, of course, is that the site can generalize to also benefit others by ‘scratching’ many people’s ‘itches’.

With RunFeed up and running, the next step was thinking about how the site could become more flexible, customizable, and generalized. I hope to continue with more posts about how feedlier evolved, as well as some of its technical details!