#talkpay

talkpaySo today I read an article about #talkpay on Twitter. Basically, people were tweeting about how much money they actually make. Now I have been looking for a reason to play with the Twitter API and this seemed like a fun, quick project. So I went and grabbed all the #talkpay tweets and applied a very simple method to extract out the salaries tweeted. I just extracted digits between $ and k, so $125k becomes 125 (See the code for all the details). Now my method is definitely far from perfect, so please don’t infer much from the plot, but it is kind of interesting to look at.

Here is the code

Introduction to Bayes Theorem with Python

Go straight to the code

So I feel like there is not a lot of good information out their on how to use Bayes Theorem for modeling – especially with Python code.  Like try figuring out how to understand a Bayesian Linear Regression from just Google searches – not super easy. So I thought I would maybe do a series of posts working up to Bayesian Linear regression.

The first post in this series is an introduction to Bayes Theorem with Python. I hope this post helps some understand what Bayes Theorem is and why it is useful. As well as get a small insight into how it differs from frequentist methods.

If anything isn’t clear or you have any comments, please let me know! Also, if you have any great Bayes resources, please share, and I can add them to the site.

View my introduction here.

This article has been featured on dataconomy here.

Quick Guide to Making an Interactive D3 Map

Preliminaries

First you need to install a few things:

  1. Node
  2. TopoJSON
    1. From terminal run: npm install -g topojson

Makefile

The first step is to create your Makefile. Mike Bostock has an excellent explanation of why and how to use Makefiles. I won’t go into much detail, but you can use my makefile.  Save it as a text file called Makefile and then create a directory called build. Then download the csv data for the map here (save it in the same directory as your makefile). Now use your makefile:

  1. Download the map data
    1. This can be run with the make build/gz_2010_us_040_00_20m.zip command in the terminal
  2. Extracttheshapefile and convertittoTopoJSON withyourcsv data merged in
    1. This is done using the make build/states.json command

You should now have a file called states.json in your build directory.

A few things I want to point out in the Makefile:

id-property='STATE,id'

This command is how we tell the program which variables to merge by. We are telling it to merge on our csv data when our id variable matches the STATE variable in the shapefile . These variables are the numeric IDs for the states.

--external-properties=state_data.csv

This tells it where to find our data.

--properties='rate=+d.properties["value"]'

This adds a rate property to our states.json file using the value variable from our csv data. Now every state will have a rate property that we can access.

You can see how you can easily change the above values to incorporate your own state data into the states.json file. All you have to do is update the Makefile to reflect your data and properties and then re-run the “make build/states.json”

HTML Code

Create an index.html file in your directory and copy my code from here. From examining the code you can get a decent idea of how it works, but I want to point out a few things:

var num_format = d3.format(",.0f");

This is how you can create a number format in D3. Use these so your numbers look nice.

var color = d3.scale.quantize().domain([25, 63]).range(colorbrewer.Greens[7]);

D3 scales are awesome. The quantize scale is taking in our domain – basically the range for our median rank data – and mapping it to seven buckets using the color brewer. That way we now have seven colors to use for our map.

function (d) {return d.properties.rate == null ? "#000000" : color(d.properties.rate);

Here you are using a function and passing it d. You can think of d as basically being our state. So d.properties.rate is the rate for our state. And this function will be applied to all states. My function is of the form a == b ? c : d. This is saying if a == b then do c else do d. So what I am accomplishing with this function is coloring the states black that have no rate data and coloring the other states according to our color scale defined above. Pretty cool.

function(x) { return d.properties.cost == null ? 'No Ranked Schools' : 'Median In-State Cost: $' + num_format(d.properties.cost) + " <br/>" + "Median Rank: " + num_format(d.properties.rank) + " <br/>" + "Number of Schools: " + d.properties.number;}

Here is another example of using a function to create the text you see when the mouse scrolls over a state. We are using our number format and <br/> to create line breaks in the text.

Legend

At the end of the html file we create the legend. Basically we are passing the range of our color scale as data:

data(color.range())

And then filling in a rectangle with those colors:

style("fill", function(d, i) { return d; })

After that we define some text attributes that show the words “Highest” and “Lowest” next to our scale. I got the x and y values for the text through trial and error. I am sure, though, there is a better way.

Get it running

The easiest way for me to locally host my D3 map is to use Python. If you don’t have Python, check out my introduction to Python – it has installation instructions. Within the terminal, in the same directory as your index.html file, type (I think this may only work with Python 2):

python -m SimpleHTTPServer 8888 &

Then in your web browser navigate to:

http://localhost:8888/

You should hopefully see your map! Also, if you want to get this up so you can share it with people, I would recommend using GitHub Pages. Basically you just create a new repository and push up your index and map data files. Very easy.

Final Notes

I hope this guide was useful. My goal was to provide a very fast way to create your own D3 map and to understand some of the more important parts of the process. This guide leaves out a lot of details, though. If you want a better understanding of D3, check out some of the books and sites that I have found useful.

You will also notice that I didn’t talk at all about the data. This is because I am just using these data to create a map; not for analytical purposes. My hope is that once you get the map working you will modify it to use some new data that you find interesting. If you do, please share your map with me on twitter @tyler_folkman.

If you want to know a tiny bit more about the data see my original post. This post will also show you the final version of the map.

Also, if you have any problems following this guide, please let me know either through a comment or by contacting me. I would love to help / fix any bugs in my explanation. Thanks!

Special thanks to these great sites that helped me figure this out:

http://bost.ocks.org/mike/bubble-map/

https://suffenus.wordpress.com/2014/01/07/making-interactive-maps-with-d3-for-total-beginners/

An Exercise in Making an Interactive D3 Map

d3_map(click on map – it is interactive)

I have been dying to create a D3 map that is interactive. I took the median ranking of public schools (schools with cheaper in state tuition) based on ranked schools by the US News. Please do not take this map as representing which states are best. There are many flaws with median values (and rankings) and many other factors to consider. These data were used just to make a map. I will post soon on how it was created.

See the map here

These great sites helped me figure this out:

http://bost.ocks.org/mike/bubble-map/

Making a Simple Interactive Map Prototype with D3…For Total Beginners Who are Totally Impatient

Data Science to Improve Healthcare

So I am extremely interested in using healthcare data to improve health outcomes, but unfortunately healthcare data can be hard to come by. But, lucky for us, a great group of people have put together a demo data set of ICU data that is open to the public! The data set is called the MIMIC II Demo. The main page for these data is here.

Click here to see my introduction to using these data. Hopefully it can help some better understand healthcare data.