How to better analyze Google Analytics data using R

We are currently faced with the great speed with which events occur in the digital world.
The most powerful tool to react to that endless career is knowledge.

If it is a website, to be able to effectively carry out continuous improvements, we need to know our users, their behavior, their preferences.
For this we have a great ally: Google Analytics. This tool allows us to gain that knowledge.
It has very friendly interfaces through which we can have a first approximation of the behavior of the users.

But what if we want to go further?
We can use this powerful tool to extract data, analyze it and convert it into information.
We did a little analysis to try to get ourselves to handle the data flow to which we have access.
We chose R as a tool for statistical analysis and graphs. The same information can be used to make reports and export them in different formats.

All the research was done using RStudio, we selected the googleAnalyticsR package to access Google Analytics data and ggplot2 package to graphs.

First load up these packages.

library(googleAnalyticsR)
library(ggplot2)

Then we need to authorize Google Analytics to access the data.

ga_auth()

With the authorization fulfilled we can assign the query results to a "data frame" called ga_data. Only you need to know is the ViewID assigned to your Google Analytics account.

ga_data <- google_analytics_4(viewId = "*********", date_range = c(Sys.Date()-30, Sys.Date()), metrics = c("users", "sessions"), dimensions = c("date","country","city","month","year","devicecategory","continent"), anti_sample = TRUE)

Now in ga_data we have the result of the query and we can use this in several ways.

Continents and Devices

ggplot(data=ga_data,aes(x=continent,y=sessions,fill=devicecategory)) + geom_bar() + ggtitle("Device use by Continent") + labs(x="Continent", y="Devices") + labs(fill="Devices")

We use several parameters to define:
- X axis = continent, Y axis = sessions and fill = devicecategory
- geom_bar is the type of graph
- And labs for friendlier user labels

Timeline with Continents and Devices

A new query is used to load the sessions access by continent and device as a time series

ga_data_devices <- google_analytics_4(viewId = "*********",date_range = c(Sys.Date()-500, Sys.Date()), metrics = c("sessions"), dimensions = c("date","devicecategory","continent"), anti_sample = TRUE)
ggplot(data=ga_data_devices,aes(x=date,y=continent,fill=devicecategory)) + geom_line(size=3, aes(colour=devicecategory))  + labs(colour="Devices") + xlab("Date") + ylab("Continent")

Relationships between Continents and Sessions

We used ga_data again to show the relationship between continents and sessions, but in this case the color represents the continents.

qplot(date,sessions,data=ga_data,color=continent,main="Histograma")+ theme_bw() + ggtitle("Sessions by Continent") + labs(x="Date", y="Sessions", colour = "Continent") 

Sessions and Users

Here are the daily sessions during August 2017. Color indicates the density of users in a new query loaded into the ga_data_small variable.

 ga_data_small <- google_analytics_4(viewId = "*********", date_range = c(Sys.Date()-30, Sys.Date()), metrics = c("users", "sessions"), dimensions = c("date"), anti_sample = TRUE)
ggplot(ga_data_small, aes(date, sessions, color=users)) + geom_line( size=2) + xlab("Date") + ylab("Daily Sessions") + theme_bw() + ggtitle("Density Labs Sessions and Users") + labs(colour= "Users")  

Time series Sessions by Countries

In ga_data_countries we load the results about 300 days from today, and in ga_data_countries_subset we apply a range within the obtained result to only graph the cases with sessions greater than 3.

ga_data_countries <- google_analytics_4(viewId = "*********",date_range = c(Sys.Date()-300, Sys.Date()), metrics = c("sessions"), dimensions = c("date","country"), anti_sample = TRUE)
ga_data_countries_subset <- subset(ga_data_countries, sessions > 3)
ggplot(data=ga_data_countries_subset,aes(x=date,y=sessions, color=country)) + geom_line(size=1) + theme_bw() + theme(legend.position = "bottom", legend.direction="horizontal", legend.title = element_blank()) + ggtitle("Time series Sessions by Country") + labs(x="Date", y="Sessions")  + scale_colour_discrete()

Conclusions

R is a tool that has a lot of potential. It’s very robust and effective for handling statistical data and allows managing large data volumes. The limit is our imagination.

If you are interested to know more about this, we are at your disposal! We can develop your dreams, we can make them come true together.

If you can dream, we can code it.

R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
URL https://www.R-project.org/.

Blog Analytics R Blog Analytics R

We are currently faced with the great speed with which events occur in the digital world. The most powerful tool to react to that endless career is knowledge.

If it is a website, to be able to effectively carry out continuous improvements, we need to know our users, their behavior, their preferences.