Example of Data Analysis using R Language

In this blog post I will guide through a small example of data analysis using a data-set containing figures about Cat vs Dog Popularity in the US.

R logo

Why R?

Because it’s a programmable environment that uses command-line scripting, you can store a series of complex data-analysis steps in R. That lets you re-use your analysis work on similar data more easily than if you were using a point-and-click interface, notes Hadley Wickham, author of several popular R packages and chief scientist with RStudio.

It’s free, open source, powerful and highly extensible. “You have a lot of prepackaged stuff that’s already available, so you’re standing on the shoulders of giants,” Google’s chief economist told The New York Times back in 2009.

Before starting with this blog post, I completed the Try R Code School, a free tutorial online that guides users through examples and experiments.

CA 2 Completion of R Online Course - SC

Cats Vs Dogs Popularity in the US

In this example, I am using a data set that containing the Population and ownership by household of dogs and cats broken down by state via American Veterinary Medical Association (avma.org). It took me a while to understand the R syntax and so this example is very easy and perfect for people as me who are just initiating in this tool.

Let’s set up our working environment!

Step 1

So, the first thing first, you need to install R, go to r-project.org, R runs on Windows, OS S and some other Unix platforms.  Installing R is actually all you need to get started. However, I’d suggest also installing the free R integrated development environment  RStudio, which is more user friendly, and which you can use to write your queries and get the visualization of your plots.

Step 2

Once you are in RStudio, you need to set up your working directory under Session > Set Working Directory. I will be using the Graph Bar feature, I downloaded the package to do so by writing the following syntax: install.packages(“ggplot”)

Step 3

Import the data-set catsvsdogs in csv format, include the headers and separate them by commas for easier visualization of the data.

What States in the US Own more Pets?

In the following barplot is easy to observe the distribution of pets -cats and dogs- by US States.

bp states

See here the syntax I used to create the graph above:

Barplot

barplot(catsvdogs$PetHousesP, space =1, main = “US Homes with Dogs or Cats by State”, ylab=”Percent of Homes”, col=”grey50″)

Label

text(seq(1.5,98,by=2), par(“usr”)[3]-0.25, srt=60, adj=1, xpd=TRUE, labels = paste(states), cex = 0.7)

Cats Vs Dogscat-and-dogqwqw

USACatsVsDogs

This graph shows the distribution of cats and dogs by state, as you can see there is only a slight difference between cats and dogs population.

I used the next syntax to create the graph above.

Cats and Dogs Percentage Bar Plot

Use data.frame to create a 50 x 2 array of dog and cat percentages

myarray <- data.frame(catsvdogs$DogOwnersP , catsvdogs$CatOwners)

Use barplot to plot the array, barplot expects an vector of heights so in this case we transpose to 2 x 50 array

barplot(t(myarray), space =0.7, names.arg = catsvdogs$Location,  las=2, col= c(“blue”,”red”), cex.names  = 0.75, main = “Homes with Cats and Dogs”, cex.main = 0.85)

las=2 puts the labels at 90 degrees to the axis, cex adjusts character sizes on labels and title

legend(legend = c(“Dogs”, “Cats”), fill = c(“blue”,”red”), cex = .65, ncol =2,x=-5, y=93, bty=”n”)

bty = “n” removes the box from the legend, x and y positions it, and ncol makes the legend 2 columns wide

Other ideas

Geographical Map

In R is possible to create a geographical map and associate the data-set results, this visualisation will show a physical distribution of the pets, I would like to see if the households with more pets are close to each other, near to cost lines or in the interior or the country for example.

Correlation Map

If we create a correlation map, we wil be able to see if there is any correlation between dogs and cats owner, how many of them own dogs and cats at the same time, or if the owners or a particular pet are more inclined not to share with other pets.

Last though

R language definitely takes time to learn, the syntax is not as intuitive as you may like, but if you are looking for precise results, this tool will provide you with that.

Useful Links

  • http://tryr.codeschool.com/
  • http://www.cookbook-r.com/
  • http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html

 

Leave a Reply

Your email address will not be published. Required fields are marked *