This tutorial explains how to use Google Analytics in R. It includes several examples which help you to perform any kind of analysis on your Google Analytics data using R.
In this article, we will use Google Analytics 4 (GA4) which is the latest version of Google analytics. It was designed to make it better to track what people do on websites and mobile apps.
First make sure you install the following packages.
- googleAnalyticsR
- gargle
Install these packages using the syntax below.
install.packages("googleAnalyticsR") install.packages("gargle")
Steps to Integrate Google Analytics into R
Step 1: Authentication
Initially you need to authenticate with Google in your browser. Run the syntax below. It will open browser and ask you to login with Google. This is required when you do it first time. In the subsequent runs, it will be authenticated automatically.
# Libraries library(googleAnalyticsR) library(gargle) # Access ga_auth(email="deepanshuxxxxxx@gmail.com") ga_account_list("ga4")
Step 2: Fetch Google Analytics Data
The following code pulls google analytics data and store it in R dataframe. You need to specify your GA4 property ID, along with start and end dates of the data you wish to extract.
my_property_id <- 3819XXXXX from_date <- "2023-09-12" to_date <- "2023-09-19" overall <- ga_data( my_property_id, metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"), date_range = c(from_date, to_date) )
# A tibble: 1 × 4
activeUsers newUsers sessions screenPageViews
1 56399 44636 81366 114172
Country-wise Breakdown
To view Google Analytics Traffic Data across countries, you can add dimensions argument in the ga_data() function.
my_property_id <- 3819XXXXX from_date <- "2023-09-12" to_date <- "2023-09-19" # By Country country <- ga_data( my_property_id, dimensions = c("country"), metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"), date_range = c(from_date, to_date) )
# A tibble: 100 × 5
country activeUsers newUsers sessions screenPageViews
1 United States 18102 14378 25746 29730
2 India 9814 7021 17427 30698
3 United Kingdom 3243 2567 4549 5477
4 Canada 2327 1889 3061 3879
5 Australia 1865 1493 2606 2952
6 Germany 1587 1278 2137 2540
7 Singapore 1117 955 1526 1887
8 France 1067 885 1427 1600
9 Netherlands 1017 854 1329 1510
10 Brazil 881 651 1248 1504
# ℹ 90 more rows
# ℹ Use `print(n = ...)` to see more rows
Day-wise Breakdown
To see google analytics data by each day, you can add dimensions date
and dayOfWeek
. The dimension "dayOfWeek" shows the day of the week i.e. sunday when dayOfWeek=0 and saturday when dayOfWeek=6.
my_property_id <- 3819XXXXX from_date <- "2023-09-12" to_date <- "2023-09-19" library(dplyr) sample <- ga_data( my_property_id, dimensions = c("date","dayOfWeek"), metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"), date_range = c(from_date, to_date) ) %>% arrange(desc(date))
# A tibble: 8 × 6
date dayOfWeek activeUsers newUsers sessions screenPageViews
1 2023-09-19 2 9954 7053 12343 16108
2 2023-09-18 1 7844 5583 10193 13643
3 2023-09-17 0 3213 2473 4104 5965
4 2023-09-16 6 4019 3022 5236 7041
5 2023-09-15 5 8625 5933 11407 16009
6 2023-09-14 4 9750 6745 12493 19848
7 2023-09-13 3 9910 6938 12973 18394
8 2023-09-12 2 9857 6889 12934 17164
How to add Multiple Dimensions
You can specify multiple dimensions in the dimensions
argument of ga_data()
function. The following code returns web traffic data by city day-wise.
my_property_id <- 3819XXXXX from_date <- "2023-09-12" to_date <- "2023-09-19" library(dplyr) sample <- ga_data( my_property_id, dimensions = c("date","city","dayOfWeek"), metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"), date_range = c(from_date, to_date) )%>% arrange(desc(date)) %>% filter(city == "New York")
In the code above, we applied filter selecting data for "New York" city only.
# A tibble: 6 × 7
date city dayOfWeek activeUsers newUsers sessions screenPageViews
1 2023-09-19 New York 2 189 138 240 271
2 2023-09-18 New York 1 146 100 192 219
3 2023-09-15 New York 5 182 122 231 266
4 2023-09-14 New York 4 180 124 227 251
5 2023-09-13 New York 3 194 142 228 284
6 2023-09-12 New York 2 186 138 228 290
Web Traffic Data by Posts
To see post-level performance on your website, you can specify pagePath in the dimensions
argument of the ga_data()
function. Make sure to specify a high number in the "limits" argument of the function.
my_property_id <- 3819XXXXX from_date <- "2023-09-12" to_date <- "2023-09-19" basic <- ga_data( my_property_id, metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"), dimensions = c("pagePath"), date_range = c(from_date, to_date), limit = 1000, dim_filters = ga_data_filter(sessionMedium == "organic") )
Post-Level Performance by Country
To see the post-level performance of your website by country, you can specify two dimensions - pagePath and country in the dimensions
argument of the ga_data()
function. Make sure to specify a high number in the "limits" argument of the function as the combination of the number of posts and countries can result in a significantly large number.
my_property_id <- 3819XXXXX from_date <- "2023-09-12" to_date <- "2023-09-19" basic <- ga_data( my_property_id, metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"), dimensions = c("pagePath","country"), date_range = c(from_date, to_date), limit = 10000, dim_filters = ga_data_filter(sessionMedium == "organic") )
Filters
You can use the dim_filters
argument of the ga_data()
function to apply a filter to any dimension. The function ga_data_filter() is used to create a query for filtering.
In this example, we are selecting organic data by applying filter on "sessionMedium".
my_property_id <- 3819XXXXX from_date <- "2023-09-12" to_date <- "2023-09-19" basic <- ga_data( my_property_id, metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"), date_range = c(from_date, to_date), dim_filters = ga_data_filter(sessionMedium == "organic") )
Multiple Filters
To apply multiple filters to a Google Analytics report, you can use the symbols &
, |
, and !
for AND, OR and NOT conditions in the ga_data_filter() function.
# OR condition ga_data_filter(city=="New York" | city == "Los Angeles") # Select multiple values ga_data_filter(city==c("New York","Los Angeles")) # AND condition ga_data_filter(city=="Los Angeles" & sessionMedium == "organic") # NOT Condition ga_data_filter(!(city=="New York" | city == "Los Angeles"))
Real Time Data
To see real-time reports, you can set TRUE for the realtime
argument of the ga_data()
function. In the real-time report, GA4 shows the number of users, views in the past 30 minutes.
my_property_id <- 3819XXXXX overall <- ga_data( my_property_id, metrics = c("activeUsers", "screenPageViews"), realtime = T )
Note - The real-time report does NOT include these metrics - "newUsers" and "sessions".
See the output below for real-time reports.
# A tibble: 1 × 2
activeUsers screenPageViews
1 332 475
Real-time report allows limited set of dimensions. You can group the report by country or city.
overall <- ga_data( my_property_id, dimensions = c("country"), metrics = c("activeUsers", "screenPageViews"), realtime = T )
See the output below for real-time report by country.
# A tibble: 59 × 3
country activeUsers screenPageViews
1 India 135 169
2 United Kingdom 33 48
3 (other) 23 24
4 United States 19 18
5 Singapore 10 12
6 Denmark 9 9
7 Italy 9 13
8 Netherlands 9 11
9 Spain 9 11
10 Germany 8 16
# ℹ 49 more rows
Fetch Hourly Data
By specifying "hour" in the dimensions parameter of the ga_data() function, you can fetch hourly data from Google Analytics using R. To sort data by sessions in descending order, you can specify a minus sign in the ga_data_order() function. For e.g. you can use ga_data_order(-sessions)
my_property_id <- 3819XXXXX start_date <- Sys.Date() end_date <- Sys.Date() hourly.df <- ga_data( my_property_id, metrics = c("sessions"), dimensions = c("hour","country"), date_range = c(start_date, end_date), orderBys = ga_data_order(-sessions) )
Share Share Tweet