{"id":3718,"date":"2012-09-30T21:16:18","date_gmt":"2012-10-01T01:16:18","guid":{"rendered":"http:\/\/www.dr-chuck.com\/csev-blog\/?p=3718"},"modified":"2012-09-30T21:16:45","modified_gmt":"2012-10-01T01:16:45","slug":"geographic-distribution-of-my-coursera-course","status":"publish","type":"post","link":"https:\/\/www.dr-chuck.com\/csev-blog\/2012\/09\/geographic-distribution-of-my-coursera-course\/","title":{"rendered":"Visualizing the Geographic Distribution of my Coursera Course"},"content":{"rendered":"<p>As part of my <a href=\"https:\/\/www.coursera.org\/course\/insidetheinternet\" target=\"_new\">Internet History, Technology, and Security<\/a> course on Coursera I did a demographics survey and received 4701 responses from my students.<\/p>\n<p>I will publish all the data in a recorded lecture summarizing the class, but I wanted to give a sneak preview of some of the geographic data results because the Python code to retrieve the data was fun to build.   Click on each image to play with a zoomable map of the visualized data in a new window.  At the end of the post, I describe how the data was gathered, processed and visualized.<br \/>\n<center><\/p>\n<p><b>Where are you taking the class from (State\/Country)?<\/b><\/p>\n<p><a href=\"http:\/\/www.dr-chuck.com\/coursera\/insidetheinternet\/2012-001\/maps\/where.html\" target=\"new\"><img decoding=\"async\" src=\"http:\/\/www.dr-chuck.com\/csev-blog\/wp-content\/uploads\/2012\/09\/2012-09-where-are-you.png\" width=\"600\"><\/a><\/p>\n<p><b>If you went to college or are currently going to college, what is the name of your college or university?<\/b><\/p>\n<p><a href=\"http:\/\/www.dr-chuck.com\/coursera\/insidetheinternet\/2012-001\/maps\/school.html\" target=\"new\"><img decoding=\"async\" src=\"http:\/\/www.dr-chuck.com\/csev-blog\/wp-content\/uploads\/2012\/09\/2012-09-what-school.png\" width=\"600\"><\/a><\/p>\n<p><\/center><\/p>\n<p>The second graph is naturally more detailed as the first question asked them to reduce their answer to a state versus the second question asking about a particular university.  The data is noisy because it is all based on user-entered data with no human cleanup.<\/p>\n<h1>Gathering the data<\/h1>\n<p>Both fields were open-ended (i.e. the user was not picking from a drop-down).   I had no idea how I would ever clean up the data, and when I got 4701 responses, I figured I would just take a look around and realized that my students were from a lot of places.   On a lark Friday morning I started looking for the Yahoo! Geocoding API that I had heard about several years ago at a Yahoo! hackathon on the UM campus where I met <a href=\"http:\/\/www.youtube.com\/watch?v=t4p2t9JQ7_Y\" target=\"_new\">Rasmus Lerdorf<\/a> &#8211; the inventor of PHP.   I was disappointed to find out that Y! was out of the geocoding business because it sounded cool.   But I was pleased to find Google&#8217;s <a href=\"https:\/\/developers.google.com\/maps\/documentation\/geocoding\/\" target=\"_new\">Geocoding API<\/a> looked like it provided the same functionality and was available and easy to use.<\/p>\n<p>So I set out to write a spider in Python that would go though the user-entered data and submit it to the geo-coder lookup API and retrieve the results.   I used a local SQLite3 database to make sure that I only looked up the same string once.   I had two data sets with nearly 6000 items total and the Google API stops you after 2500 queries in a 24 hour period.  So it took three days to get the data all geocoded.<\/p>\n<p>I did not clean up the data at all &#8211; I just submitted the user-entered text to Google&#8217;s API and took back what it said.  Then I used <a href=\"https:\/\/developers.google.com\/maps\/documentation\/javascript\/\" target=\"_new\">Google&#8217;s Maps API for Javascript<\/a> to produce the zoomable maps. <\/p>\n<p>If you are curious about the nature of the spider, I adapted the code from the sample code in chapters  12-14 in my <a href=\"http:\/\/www.pythonlearn.com\/\" target=\"_new\">Python for Informatics<\/a> textbook.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As part of my Internet History, Technology, and Security course on Coursera I did a demographics survey and received 4701 responses from my students. I will publish all the data in a recorded lecture summarizing the class, but I wanted to give a sneak preview of some of the geographic data results because the Python [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-3718","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/www.dr-chuck.com\/csev-blog\/wp-json\/wp\/v2\/posts\/3718","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dr-chuck.com\/csev-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dr-chuck.com\/csev-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dr-chuck.com\/csev-blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dr-chuck.com\/csev-blog\/wp-json\/wp\/v2\/comments?post=3718"}],"version-history":[{"count":21,"href":"https:\/\/www.dr-chuck.com\/csev-blog\/wp-json\/wp\/v2\/posts\/3718\/revisions"}],"predecessor-version":[{"id":3741,"href":"https:\/\/www.dr-chuck.com\/csev-blog\/wp-json\/wp\/v2\/posts\/3718\/revisions\/3741"}],"wp:attachment":[{"href":"https:\/\/www.dr-chuck.com\/csev-blog\/wp-json\/wp\/v2\/media?parent=3718"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dr-chuck.com\/csev-blog\/wp-json\/wp\/v2\/categories?post=3718"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dr-chuck.com\/csev-blog\/wp-json\/wp\/v2\/tags?post=3718"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}