Debian Developer Database Statistics

I am currently writing a paper for school with the title “Free and OpenSource Software in Switzerland”, therefore I am interested in the FOSS community here in Switzerland. So I decided to make some statistic out of the Debian Developer Database, below is my ruby script for that purpose. It basically just queries the web interface, parses the results and makes some statistics. I plan to set this number in relation to the population of the corresponding countries.

#!/usr/bin/env ruby

# requires libopenssl-ruby1.8
require 'net/https'

class DDDB #Debian Developer Database

private_class_method :new

public
    def self.get_developer_count(country = 'any')
        html = get_html_for_country(country)
        if html =~ /Number of entries matched: <b>([0-9]+)</b>/
            return $1.to_i
        end
        return 0;
    end

    def self.get_developers(country = 'any')
        q = '<font size=+1>(?:<a href="(.*?)">)?(.*?)?(?:</a>)?'
        q+= '</font> (uid=.*?login:</b></td><td> '
        q+= '<a href="mailto:([a-z0-9.@]+)"'
        html = get_html_for_country(country)
        html.scan /#{q}/m
    end
    
    def self.get_countries
        http = Net::HTTP.new("db.debian.org", 443)
        http.use_ssl = true
        http.start { |http|
            res = http.get('/') 
            res.body.scan /<option value="([a-z]{2})">(.*$)/
        }
    end

private
    @@data = {}

    def self.get_html_for_country(country)
        if !@@data.has_key? country 
            @@data[country] = get_html_for_query(
                "country=#{country == 'any' ? '' : country}"
            )
        end
        @@data[country];
    end

    def self.get_html_for_query(query)
        http = Net::HTTP.new("db.debian.org", 443)
        http.use_ssl = true
        http.start { |http|
            res = http.post(
                '/search.cgi', 
                "#{query}&dosearch=Search..."
            )
            return res.body
        }
    end
end

total = 0
data = {}

DDDB.get_countries().each do |code,country|
    count = DDDB.get_developer_count(code)
    if count > 0
        data[code] = { 'country' => country, 'count' => count }
        total += count
    end
end

print "Total Debian developers: ", DDDB.get_developer_count(), "n"
print "Total Debian developers with country specified: ", total, "n"

data.each_value do |entry|
    print entry['country'].ljust(35), entry['count'].to_s.rjust(5),' ',
        "%02.2f%" % (entry['count'].to_f*100/total), "n"
end

exit 0

DDDB.get_developers('ch').each { |www,name,mail|
    print name.ljust(35),(" <"+mail+"> ").ljust(35)
    print www if !www.nil?
    print "n"
}

Oh well after having written the above script, I actually found out that I simply could have queried the LDAP directory directly. Sigh.