Debian Developer Database Statistics
I am currently writing a paper for school with the title “Free and OpenSource Software in Switzerland”, therefore I am interested in the FOSS community here in Switzerland. So I decided to make some statistic out of the Debian Developer Database, below is my ruby script for that purpose. It basically just queries the web interface, parses the results and makes some statistics. I plan to set this number in relation to the population of the corresponding countries.
#!/usr/bin/env ruby
# requires libopenssl-ruby1.8
require 'net/https'
class DDDB #Debian Developer Database
private_class_method :new
public
def self.get_developer_count(country = 'any')
html = get_html_for_country(country)
if html =~ /Number of entries matched: <b>([0-9]+)</b>/
return $1.to_i
end
return 0;
end
def self.get_developers(country = 'any')
q = '<font size=+1>(?:<a href="(.*?)">)?(.*?)?(?:</a>)?'
q+= '</font> (uid=.*?login:</b></td><td> '
q+= '<a href="mailto:([a-z0-9.@]+)"'
html = get_html_for_country(country)
html.scan /#{q}/m
end
def self.get_countries
http = Net::HTTP.new("db.debian.org", 443)
http.use_ssl = true
http.start { |http|
res = http.get('/')
res.body.scan /<option value="([a-z]{2})">(.*$)/
}
end
private
@@data = {}
def self.get_html_for_country(country)
if !@@data.has_key? country
@@data[country] = get_html_for_query(
"country=#{country == 'any' ? '' : country}"
)
end
@@data[country];
end
def self.get_html_for_query(query)
http = Net::HTTP.new("db.debian.org", 443)
http.use_ssl = true
http.start { |http|
res = http.post(
'/search.cgi',
"#{query}&dosearch=Search..."
)
return res.body
}
end
end
total = 0
data = {}
DDDB.get_countries().each do |code,country|
count = DDDB.get_developer_count(code)
if count > 0
data[code] = { 'country' => country, 'count' => count }
total += count
end
end
print "Total Debian developers: ", DDDB.get_developer_count(), "n"
print "Total Debian developers with country specified: ", total, "n"
data.each_value do |entry|
print entry['country'].ljust(35), entry['count'].to_s.rjust(5),' ',
"%02.2f%" % (entry['count'].to_f*100/total), "n"
end
exit 0
DDDB.get_developers('ch').each { |www,name,mail|
print name.ljust(35),(" <"+mail+"> ").ljust(35)
print www if !www.nil?
print "n"
}
Oh well after having written the above script, I actually found out that I simply could have queried the LDAP directory directly. Sigh.