Yes, you read that title right. As another proof of concept after server-side PDF tracking, in this post I will show you a method for measuring your Gmail activity with Google Universal Analytics.
Not that you *need* it, but it is a good example of the sort of upcoming applications for Universal!
Again, this post is not for the technically faint of heart. Still good to go? Buckle up and see you after the jump 😉
This is probably a little known fact to most of you but you can access your Gmail mailbox folders by using RSS feeds (Atom feeds, actually).
The basic Atom feed URL is:
And you can append the name of a specific Gmail label to access that folder, such as the Sent Messages folder, aka ‘sent’:
Simple and efficient, really. Of course you need to use your Gmail credentials to access the feed in a browser. But we’ll take care of authentication later.
So once we get that Gmail feed, we then need to store basic message metadata (date, e-mail ID, sender, etc.) for a given day. In this example we’ll use a database but you could probably achieve the same results with a flat file and a bit of Perl 😉 It’s a basic database interaction: store results and remove duplicate entries. This process is set to run every 30 minutes or so, depending on how much mail you get per hour, on average.
On top of that, a daily process, which runs right before midnight, counts the number of e-mails for the day and passes the message count information along with a Google Analytics event, supported by the new Universal Analytics Measurement Protocol.
Data flows into your account. You reach Nerdvana. Profit!
Now let’s dive in deeper!
Set-up with Google Universal Analytics, Gmail and PHP/MySQL
Again, this is a proof of concept. Yes, I am sure there are other ways of doing this more efficiently – go ahead and brag in comments 😉
If you do it my way, you’re going to need the following:
- a Gmail account with access credentials
- a Google Analytics account that’s Universal Analytics-enabled
- access to PHP on a local machine or any server, with cURL enabled.
Yes, you can do this with Java, Python, C++, Ruby or Cobol for all I care. I just know PHP so deal with it 😛
- access to a MySQL database; actually, any database will do but in this case, we’ll use MySQL because it plays well with PHP.
If you are afraid of coding, please turn back now and have this post read/explained to you by a tech person.
As hinted before, you’re going to need a basic ‘Gmail’ table to store e-mail metadata such as e-mail date, sender, ID, Gmail folder, etc.
Below is the SQL code you would need to create the table I’m using in this proof of concept:
[code lang=”sql”]CREATE TABLE IF NOT EXISTS `gmail` (
`id` int(6) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`date` date NOT NULL,
`mailID` varchar(20) NOT NULL,
`from` varchar(255) NOT NULL,
`folder` varchar(50) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Nice and easy. Just make sure you have a “folder” column, you’ll use it later to count received vs. sent e-mails.
And that’s it for MySQL!
With PHP we’re going to create two scripts:
- one for the half-hourly retrieval of the Gmail feed and mail metadata storage
- one for retrieving the daily e-mail count and Google Analytics event trigger
Ideally, you also need a settings include file (or 2) that contain your Gmail and/or database access credentials.
Gmail feed retrieval script
Here we’ll use the cURL library, which is mostly used for URL access/scraping:
[code lang=”php”]$feedURL = "https://mail.google.com/mail/feed/atom";
$ch = curl_init($feedURL);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_USERPWD, "$gmail_login:$gmail_passwd");
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, false);
$response = curl_exec($ch);
$infos = curl_getinfo($ch);
Once we get that feed, we use simplexml_load_string to break the feed XML into a PHP array for easier access to feed entries and metadata. Then we walk through the array and insert each feed entry in our database via a MySQL query such as:
[code lang=”sql”]INSERT INTO `gmail` (`title`, `date`, `mailID` ,`from` ,`folder`) VALUES ("test","2013-04-06","1431579090336680666","","inbox");[/code]
You’re going to end up with duplicates so the script will run the following query to weed them out.
[code lang=”sql”]DELETE n1 FROM gmail n1, gmail n2 WHERE n1.id > n2.id AND n1.mailID = n2.mailID[/code]
E-mail count retrieval script and Google Universal Analytics trigger
Using PHP with a simple SELECT COUNT(DISTINCT) MySQL query, we get a number of e-mails for the day. Use that value to populate our Google analytics endpoint, like so:
[code lang=”php”]$gaEndpoint = "http://www.google-analytics.com/collect";
"v=1" // site/app version
. "&t=event" // hit type
. "&tid=UA-7634164-5" // profile ID
. "&cid=555" // client ID
. "&dh=juliencoquet.com" //hostname
. "&ec=Gmail" // Event category
. "&ea=Received%20mail" // Event action
. "&el=$date" // Event label
. "&ev=$o->total" // Event value
. "&cm4=$o->total" // Custom metric #4 (e-mails)
Once our endpoint is built, use cURL again to send it in a HTTP POST request. Check your real-time reports. Experience a skipped heart beat when your event pops up on the Google Analytics EKG (as I like to call it).
Pat yourself on the back: it works!
Phewww. Done. Wasn’t that fun in a twisted nerdy way, though? 😉
Let’s automate this!
Using an automated cron job (crontab), let us set the Gmail feed retrieval to run every 30 minutes and the endpoint calls to be sent every day right before midnight, like so:
[code]*/30 * * * * php ~/universal/universal_gmail_retrieve_inbox.php
*/30 * * * * php ~/universal/universal_gmail_retrieve_sent.php
59 23 * * * php ~/universal/universal_gmail_endpoint_inbox.php
59 23 * * * php ~/universal/universal_gmail_endpoint_outbox.php[/code]
If you don’t know what a ‘cron’ is, ask your closest nerdy friend. They won’t bite.
You mileage may vary: each feed contains the latest 20 e-mails only; you may need to set a shorter interval (more often) between feed queries.
Anyhow, this should now work on a regular basis: run it, forget about it and enjoy the information that is now flowing into your Google Analytics account on a daily basis 😀
Great, now how do I see my Gmail activity?
You can look at your basic Events reports in Google Analytics.
Event categories, action, etc. It will all be there, don’t worry 😉
The number of GA events should match the number of times the endpoint was triggered and the event value matches the number of e-mails.
“Ah but what if I set up a custom metric?”, you ask? Well in that case you need to build a custom report, using a flat table format, with event categories and actions as dimensions and the custom dimension you chose for e-mails as a metric. The custom report creation screen should look something like this:
For a final result of:
Following these easy guidelines, you too can have a basic process for measuring Gmail with Google Universal Analytics, which was not obvious to begin with.
To be fair, i’ve explored other solutions, namely IFTTT, which proved to be unreliable with its Gmail API and only looked for new Gmail mail, not the entire mailbox. Using Google’s Gmail API proved to be too cumbersome for the use case I was trying to solve. Ultimately, using the Gmail mailbox feed proved to be the most adequate solution.
I know i’ve said this before but I’ll say it again anyway: it’s a proof of concept above all else but should give you ideas on how to extend this concept to other measurement problems you’re having with web or mobile apps. Or to detect whether your fridge is out of beer – but more on that later.
As always, constructive criticism and remarks are welcome so use the box below to say hi 🙂