Rank Order Data & Stacked Bar Charts

If you have been looking for ways to visualize rank ordered data you might be interested in using stacked bar charts.  Rank ordered data is helpful in figuring out which choices are more important and least important to participants in a survey such as favorite sports to watch or what vacation spots a participant is interested in visiting.

I have been using rank order for some survey questions at Webcomm and thought you might be interested in how to visualize that data using R and ggplot2.  For this post I am assuming you have some experience with R.

The example we are working with is based on social media platforms that survey participants regularly use.   Having a rank of 1 means that it is the platform a participant uses most.  In this case we are looking at ranks 1, 2, 3, and 4.  If a participant gives a platform a 4 they are telling us they use the platform but there are three other platforms that the participant is using more.

Here is a view of the df which holds the summary data we are working with.  If you are unfamiliar with how to get data into a similar type of data frame you might want to check out a few posts on melt.

> df
  labels   rank value
1 Facebook    1    18
2 Twitter     1    18
3 Snapchat    1    19
4 Instagram   1    26
5 Facebook    2    21
6 Twitter     2    13
7 Snapchat    2    23
8 Instagram   2    20
9 Facebook    3    19
10 Twitter    3    14
11 Snapchat   3    22
12 Instagram  3    20
13 Facebook   4    18
14 Twitter    4    15
15 Snapchat   4     9
16 Instagram  4     8

In the data frame the labels column contains the names of the social media platforms we are ranking (Facebook, Twitter, Instagram, and Snapchat).  The rank column keeps track of the percentage of participants who ranked the social media platform as 1, 2, 3, or 4.  The value column contains the percentage of participants that provided the rank for each social media platform.

So at this point it is pretty easy to do a basic stacked bar chart in R or R Studio.

> library(ggplot2)
> df.plot <- ggplot(data = df, aes(x = labels, y = value, fill = rank)) +
       geom_bar(stat="identity")
> df.plot

untitled-design-2

The height of each bar in the chart represents the percentage of participants that ranked a social media platform as their 1st, 2nd, 3rd, or 4th frequently used platform.  For the sake of this chart we can consider each bar the total percentage of participants that use the social media platform.  The sub colors in each bar represents the percentage of participants that ranked a particular social media platform as their 1st (salmon color), 2nd (green), 3rd (blue), and 4th (purple) most used platform.

Notice that for the colors above the initial salmon colored portion it gets hard to estimate the percentages.  To make it easier to understand the internal percentages, it makes sense to put the percentages in the middle of each colored section.  We’ll need to need to figure out where the middle of each colored portion is in each stacked bar chart to make it look right.  Here is a little code that makes it easy:

> library(plyr)
> df <- ddply(df, .(labels),
+ transform,
+ pos = cumsum(value) - (0.5 * value),
+ per = paste(as.character(value), "%", sep=""))

Let’s see what the updated df looks like:

> df
   labels   rank value   pos  per
1  Facebook    1    18   9.0  18%
2  Facebook    2    21  28.5  21%
3  Facebook    3    19  48.5  19%
4  Facebook    4    18  67.0  18%
5  Instagram   1    26  13.0  26%
6  Instagram   2    20  36.0  20%
7  Instagram   3    20  56.0  20%
8  Instagram   4     8  70.0   8%
9  Snapchat    1    19   9.5  19%
10 Snapchat    2    23  30.5  23%
11 Snapchat    3    22  53.0  22%
12 Snapchat    4     9  68.5   9%
13 Twitter     1    18   9.0  18%
14 Twitter     2    13  24.5  13%
15 Twitter     3    14  38.0  14%
16 Twitter     4    15  52.5  15%

Now we can add the extra columns to our graph:

> df.plot <- ggplot(data = df, aes(x = labels, y = value, fill = rank)) +
+ geom_bar(stat="identity") +
+ geom_text(aes(label=per, y = pos), size = 3)
> df.plot

graph-2

Sweet, that makes it a lot easier to understand the internal percentages.  Now let’s spiff up the rest of the graph:

> df.plot <- ggplot(data = df, aes(x = labels, y = value, fill = rank)) +
 + geom_bar(stat="identity") +
 + geom_text(aes(label=per, y = pos), size = 3) +
 + # Help the legend make more sense
 + guides(fill=guide_legend(title="Rank Order\n1 = High\n4 = Low")) +
 + # Blank out the x and y axis labels
 + ylab("") +
 + xlab("") +
 + # Main Title
 + ggtitle("Which social media platforms do you regularly use?") +
 + # Reduce the gray background
 + theme_bw() +
 + # Place x labels at 45 degrees
 + theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
 + # Change text sizes
 + theme(plot.title = element_text(size = 10)) +
 + theme(legend.title = element_text(size = 7)) +
 + theme(legend.text = element_text(size = 7)) +
 + # Adjust the scale of the y axis and show units as percentages
 + scale_y_continuous(breaks=c(10,20,30,40,50,60,70,80,90),
      labels=c("10%","20%","30%","40%","50%","60%","70%","80%","90%"))

graph-3

And there you have it! Please let me know in the comments if you find this post helpful.

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *