Combining Stack Overflow RSS, OData and API to query


In my opinion Stack Overflow has a ton of knowledge to learn new tricks. And there are some really smart people in the SO community. I try and learn new things when I find time.

I subscribe to RSS feeds for new questions on a particular topic. Example, here is one for F# from Stack Overflow http://stackoverflow.com/feeds/tag/f%23. The advantage of the RSS feed is I get to see new questions, but the drawback is I would have to navigate to the site to look for answers. AFAIK the stacky (stack overflow API) does not provide a mechanism for querying new questions based on a tag.

It was easy for me to combine both of them to solve my problem. With RSS feed I could discover new questions and with the stacky I could get answers . And I use Linqpad as a scratchpad so it was easy to write-up something quick.


void Main()
{
 var reader = XmlReader.Create("http://stackoverflow.com/feeds/tag/f%23");
 var feed = SyndicationFeed.Load<SyndicationFeed>(reader);
 var length = "http://stackoverflow.com/questions/".Length;

 var client = new StackyClient("1.0", File.ReadAllText(@"c:\temp\so.txt"),HostSite.StackOverflow,new UrlClient(), new JsonProtocol());

 var feedItems = from item in feed.Items
                 let nextOccurence = item.Id.ToString().IndexOf("/",length)
                 let getId = new Func<int>(() => Convert.ToInt32( item.Id.Substring(length,nextOccurence - length)))
                 select new {Id = getId(), Title = item.Title.Text, Body = item.Summary.Text.StripHTML()};

 var answers = client.GetQuestionAnswers(feedItems.Select (y => y.Id),new AnswerOptions() { IncludeBody = true});

 // The latest F# feed questions and answers
 var qa = from question in feedItems
          join answer in answers on question.Id equals answer.QuestionId
          where answer.Accepted == true
          select new { Title = question.Title, Question = question.Body.StripHTML(), Answer = answer.Body.StripHTML()};
 qa.Dump();
}
public static class Extensions
{
      public static string StripHTML(this string s)
      {
         return Regex.Replace(s, @"<(.|\n)*?>", string.Empty);
      }
}

And if you have been following F# and functional programming then you would probably know Tomas. I would also like to read what he has been answering. Again stacky does not provide an API to query user by name. This is where the SO OData comes in handy and LinqPad handles OData very well. Here is the code to get Tomas user id via OData and query for questions and answers which he has answered using stacky .

var tomas = Users.Where(u => u.DisplayName.StartsWith("Tomas Pet")).First().Id;
var tomasQA = from ans in  client.GetUsersAnswers(tomas,new AnswerOptions() { IncludeBody = true })
              select new { Title = ans.Title, Question = client.GetQuestion(ans.QuestionId,true,false).Body.StripHTML(),
              Answer = ans.Body.StripHTML()};
tomasQA.Dump();

Using Tech-Ed OData to download videos


I wanted to watch the Teched 2010 videos, but the problem I had was going to the site manually to download files for offline viewing.  And I was also interested only in Dev sessions which were level 300 / 400. Thanks to OData for teched http://odata.msteched.com/sessions.svc/ ,I  could write 3 statements in linqpad and had them all downloaded using wget

File.Delete(@"C:\temp\download.txt");

Sessions
.Where (s => (s.Level.StartsWith("400") ||  s.Level.StartsWith("300") ) && s.Code.StartsWith("DEV"))
.Take(10)
.ToList()
.Select (s => @"http://ecn.channel9.msdn.com/o9/te/NorthAmerica/2010/mp4/" + s.Code + ".mp4" )
.Run(s => File.AppendAllText(@"C:\temp\download.txt",s + Environment.NewLine));

Util.Cmd(@"wget.exe -b -i c:\Temp\download.txt",true);

Forgot to mention for the Run extension method is from Reactive Extensions

Using LINQPad , PLINQ to grep for files


I use linqpad as my primary dev tool for iterative code development. Today I had to search the source code tree on my hard disk for certain keywords that were present in database. There is big impedance mismatch between database and the rest of the world. I would have to fetch the data from DB and had to use either powershell or cmd to look for them. This is fine if I have to look for one keyword the issue if I have few of them then it becomes a bigger issue.  I remember looking at a cool example from PFX about using parallel grep. Here is the code I modified it to include optional parameters and return a type


public class GrepResult
 {
 public string File {get;set;}
 public int Line {get;set;}
 public string Text {get;set;}
 }
 public static class Extension
 {
 public static IEnumerable<GrepResult> Grep(string regexString, IEnumerable<string> wildcards,bool ignoreCase = true, bool recursive = false)
 {
 var regex = new ThreadLocal<Regex>(() =>
 new Regex(regexString, RegexOptions.Compiled | (ignoreCase ? RegexOptions.IgnoreCase : RegexOptions.None)));

 var files = from wc in wildcards
 let dirName = Path.GetDirectoryName(wc)
 let fileName = Path.GetFileName(wc)
                from file in Directory.EnumerateFiles(
                String.IsNullOrWhiteSpace(dirName) ? "." : dirName,
                String.IsNullOrWhiteSpace(fileName) ? "*.*" : fileName,
                recursive ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly)
                select file;

 var matches = from file in files.AsParallel(). AsOrdered().WithMergeOptions(ParallelMergeOptions.NotBuffered)
               from line in File.ReadLines(file).Zip(Enumerable.Range(1, int.MaxValue), (s, i) => new { Num = i, Text = s, File = file })
               where regex.Value.IsMatch(line.Text)
               select line;
 foreach (var line in matches)
 {
      yield return new GrepResult() {File = line.File, Line = line.Num,Text = line.Text} ;
 }
 }
 }

I would compile this into a dll and add reference to linqpad.  And here is the code that actually using it


static void Main()
 {
 Environment.SetEnvironmentVariable("LinqQueries",@"c:\Users\naveen\Documents\LINQPad Queries\*.*");
 var linqPath = Environment.GetEnvironmentVariable("LinqQueries");
 var x = from i in ConfigEntries
 select i.Name;
 x.Select (y => Extension.Grep(y.Name,new [] {linqPath})).Dump();
 }

I usually set environment variables (I am setting the env variable just for demonstration) for my projects so that I don’t have type the path manually. In the above code I am getting the values from database and searching those keywords within my hard disk. I could have use the Findstr method via Util class within LINQPad , but the PLINQ  implementation can use all my quad cores. I search for code very often and this is going to be handy.

Using LINQ and Reactive Extensions to overcome limitations in OData query operator


I was pleased to know that Netflix had OData API to query. The practical reason is obviously was to use the API to query for the movies I want to watch. Like I mentioned in my previous post, I will be using LINQPad 4 for querying purposes, because of its built-in capabilities for OData as well as for Rx.

One thing I discovered after playing around with OData is that not every query operator in LINQ is available in OData. For example the Netflix API has only for 4 operators which are

  1. Filter
  2. Skip
  3. Take
  4. Orderby

And also the query returns only 20 rows as the result for each request. So for example if I have to get 40 rows, on my first request  the server would return 20 rows and in my next request I would have to skip first 20 and take next 20 to get 40 rows. These are some of the limitations.

Here is what I wanted from Netflix, I wanted to movie listings that has an average rating greater than 3.5 ,ordered by release year descending and grouped by listings that are available for instant watch.  So that I can have one queue for movies that I want to watch online and another one that I can request via mail (the ones that is not available in instant watch).  And here is the query to do that


 var movies = from counter in (from e in Enumerable.Range(0,400) where e%20  == 0 select e).ToObservable()
 from movieTitle in Titles.Where (t => t.AverageRating > 3.5).OrderByDescending (t => t.ReleaseYear).Skip(counter).Take(20).ToObservable()
 select movieTitle;

var moviesILikeToWatch = from counter in movies
 group counter by counter.Instant.Available into g
 select g;
moviesILikeToWatch.Dump();

The first “from counter” query is to build the skip part, like I mentioned by default the  result returns only 20 rows I wanted 400 rows to achieve that I used the enumerable range to generate sequence that I can use for skipping in my next query. I could have very well used for loop to build this, but that is not what I want. I want to try and write terse code. These are actual calls to Netflix OData  API

http://odata.netflix.com/Catalog/Titles()?$filter=AverageRating gt 3.5&$orderby=ReleaseYear desc&$skip=0&$top=20
http://odata.netflix.com/Catalog/Titles()?$filter=AverageRating gt 3.5&$orderby=ReleaseYear desc&$skip=20&$top=20

In the below picture linqpad makes 20 calls to Netflix for getting 400 movie listings

The next line in the first  query “from movieTitle” is simple Linq query to get movies based on filter criteria along with skip and take. The reason for the second query is because the OData  API doesn’t provide a groupby operator and if I include it in my first query , Linqpad would try and convert it to OData specific  request which would fail. So essentially I am getting all the data from the server and then grouping it locally.

This wouldn’t have been possible without OData.

Using OData , LINQPad, Reactive Extensions (Rx) to query stackoverflow


I saw this cool post from Scott Hanselman on creating a OData API for stackoverflow. I use LINQPad more often than anything.   And sometimes when I am not very busy, I also look for unanswered questions in stackoverflow.    I have been playing around with Reactive Extensions. FYI LINQPad 4.0 supports Rx. So I thought how cool will be it if I have to look for unanswered “windbg” questions from stackoverflow , so that I could answer them.  And here is the query

var windbgQuestions = from time in Observable.Interval(TimeSpan.FromMinutes(1))
                      from post in Posts.ToObservable()
                      where post.AnswerCount == 0 && post.Tags.Contains("windbg")
                      select post.Body;
windbgQuestions.Dump();

So this would essentially keep querying stack overflow ,if stackoverflow has to implement OData . And I wouldn’t have to launch and application to look for unanswered questions.

I know this will not work now. But how cool it is to combine these frameworks write very succinct code to get what we want, without having to jump through hoops.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: