R is a great tool for creating data visualizations like the one above. The learning curve is a little bit steeper than a low code alternative such as Microsoft Excel. However, R makes it easy to transform certain types of data such as a csv file with data points represented as column headers rather than row data. For example, the data that is supplied by the Fair Housing Finance Administration is organized by month and regional market, meaning that there is a column for each regional market in the data. The reshape library makes it easy to transform this data into three column csv file where the columns represent the month, index value, and regional market.
require(ggplot2) require(reshape) # Change path to location of your csv # Read data from file and append to class 'd' d <- read.csv(file="~/house-price-index/hpi-monthly.csv,head=TRUE,sep=",") # Format d$month month <- c(as.Date(d$month, format="%m/%d/%Y")) # Must rename headers in House Price Index Data to reflect values shown in the dataframe for d$ below # Create data frame for class 'd' and append to class 'df' df <- data.frame(time=month, encn=d$encn, escn=d$escn, man=d$man, mn=d$mn, nen=d$nen, pn=d$pn, san=d$san, wncn=d$wncn, wscn=d$wscn, usan=d$usan) # Alter structure of data so that all horizontal columns are merged into a single two column layout df <- melt(df ,id.vars = 'time', variable.name = 'series') # Append list of labels to class 'labels' labels <- c("East North Central", "East South Central", "Middle Atlantic", "Mountain", "New England", "Pacfic", "South Atlantic", "West North Central", "West South Central", "USA") # Append list of colors for data labels to class 'colors' # Colors pulled from list of optimally disimilar colors: # http://godsnotwheregodsnot.blogspot.ru/2012/09/color-distribution-methodology.html colors <- c("#00A6AA", "#BA0900", "#6B7900", "#00C2A0", "#FFAA92", "#FF90C9", "#B903AA", "#D16100", "#00A6AA", "#000035") # Append function ggplot to class 'c' c <- ggplot(df, aes(time,value)) c <- c + geom_line(aes(colour = variable, group = variable)) c <- c + ggtitle("FHFA House Price Index (NSA) 1991-2016") c <- c + labs(y="Index Value",x="Year",colour="Market") c <- c + scale_color_manual(labels = labels,values = colors) # A ppend plot theme to class 'c' c + theme(panel.background = element_rect(fill = "transparent"), panel.grid.major = element_line(colour = "black", linetype = "dotted"), panel.grid.minor = element_line(colour = "black", linetype = "dotted"), legend.key = element_rect(fill = "transparent"), legend.key.size = unit(1,"cm"), legend.text = element_text(size = rel(1.5)), legend.title = element_text(size = rel(2)), axis.title.x = element_text(size = rel(1.5)), axis.title.y = element_text(size = rel(1.5)), plot.title = element_text(size = rel(2.5)))
In the example above, we start by calling the ggplot2 and reshape libraries. Then we read the House Price Index csv data and add that data to the class ‘d’. Afterwards we declare a new class called ‘month’ and format the month column from our csv as m/d/y.
Next we declare a class called ‘df’ and append a data frame to df which includes the data from our csv organized by column header. Note that to make things simple, I altered the column headers in the csv to make it easier to add the data to the data frame. After our data frame has been added to df, we use the melt method from the reshape library to transform the grid style data structure into a linear data structure that stores the month as time, market name as series, and index value as value. Imagine taking an 10 column horizontal spreadsheet in Excel and turning it into a 3 column spreadsheet by copying the values from each column header and pasting them as new rows. After pasting each row, you also add the column header to the third row.
After this we create a class that contains labels and a class that contains the colors for each label. Libraries like ggplot will handle labels and colors automatically. However, I wanted to have descriptive labels for the values represented in the plot.
Finally, we plot the data by assigning the ggplot function to ‘c’. We also add in a title, x and y labels, and our label names and colors for the values shown in the plot. In addition, we specify some additional theme parameters for our plot.
The source data can for this plot can be acquired from the FHFA.