Reading Time: 15 minutes
There have been many discussions lately around the “breach, not hack” of Facebook user data, specifically in light of how Cambridge Analytica may or may not have used it to influence elections around the world.
Did users naively grant permission for their data to be shared? Was it outright taken without consent? Did a series of unfortunate and unforeseen events and loopholes lead to the situation we see today?
I have a lot of opinions on this, from the perspective of both a user and a technologist, but I don’t want my opinions to be based purely on assumptions and what I’ve heard from other people. I want to do some exploration first and find out what Facebook really has on me, so I downloaded my Facebook data.
This blog post is a bit of an experiment, as I’m going to try and document my reactions to seeing all the data that Facebook has about me. It’s kind of like those reaction / unboxing videos on YouTube, except I don’t have a YouTube channel, so… Weird diary-type entry it is.
I’m going to try and keep this as raw and authentic as possible, so apologies in advance for any spelling and grammar mistakes, missing insights, and things that might just not make sense. Go easy!
Some Assumptions Hypotheses
I’d like to see if / how my assumptions change as I discover the data that Facebook holds about me, so I’ll start by including a few notes about my current assumptions that I hope to either confirm or have challenged in some way. It might be worth noting that, at this point, I’ve already had a look at Facebook’s information page on what kind of data will be included in the archive I’ve downloaded. However, I haven’t started looking at the download itself yet.
Some things I hope to have confirmed or challenged by this experiment:
- Contact data: I’ve never given Facebook my phone number or permission to view my phone’s contact list / phone book, but I already know that there is a loophole of someone’s contact list being classed as “their data”, so if they agree to share “their data”, this could actually include other people’s data (including mine). This means that my full phone book might not be included in my data download, but cross-referencing means that they will probably still have my phone number, and possibly the contact information of mutual friends, where others have shared the data of people that I also know… I’m not sure if that would be connected to their profiles on my file and then included in my data download, though. In other words, I think there might be data here that I already have, but haven’t given direct permission to share with Facebook, and I think that Facebook might have other information about me that isn’t included in my own data download.
- Marketing data: It’s well known that if you’re not paying to use a product, then you are the product. This is especially obvious when it comes to platforms like Facebook and Twitter, who make the majority of their revenue from ad sales. In order to make more money from those adverts, they offer to make them targeted. In order to make them targeted, I assume that Facebook must have a lot of data about my interests and political views, as a minimum.
- Location data: I’ve done everything that I can (okay, not everything) to avoid sharing my location data with Facebook, but I know they have some anyway. I know this could be aggregated from login IP data, check-ins that other people have tagged me in, and events that I’ve marked myself as “going” to. Since moving from the UK to Germany, I’ve noticed a lot of annoying behaviours regarding language and translations. On every possible settings screen, I’ve specified English as the only language I want to see on Facebook, but the posts from company pages I’ve “liked” have still changed to German. Even if some are automatically translated to English, that wasn’t necessary before. I’m interested to see if there are any signs of what that’s based on, and why my physical location, that I’ve tried not to share, is overriding my account settings.
Photos and Videos
When I extract the .zip file that I downloaded, I see options for four folders and a file: html, index.htm, messages, photos, and videos.
I think I’ll start with the photos and videos folders, since they seem pretty self-explanatory, and I don’t expect to see many surprises here. What I do expect is a lot of photos. One of the main reasons I joined Facebook was to see and share photos from friends and events, and I know that I’ve both uploaded and been tagged in a lot of them.
I wonder if the photos I’m tagged in, but haven’t uploaded myself, still count as my data. I also saw a category for facial recognition data on the info page, and wonder what that will look like. I remember that several years ago, Facebook seemed to suggest to me that automatic tagging in pictures was possible, but that the feature just wasn’t available to me yet. It still doesn’t seem available, so I wonder what they’ve done with all that data and if the feature is still coming.
As I suspected, nothing remarkable. Lots of sub-folders to house the many, many pictures, and corresponding HTML files to view each batch via the browser. It seems that photos I’ve been tagged in but didn’t upload are included, as are comments, but not likes. Camera data is included under some of the photos too. There are only three videos in the video folder. This seems about right.
No facial recognition data here.
Next, I want to see what’s in the messages folder. I read on the Facebook info page that messages I’ve deleted won’t be included, because they’ve been deleted from my account. I have my suspicions about this, because I’ve heard about people pulling messages from their archives… If messages aren’t archived automatically (which mine don’t seem to be), then that would mean that people specifically archived them and then retrieved them from the archives, as opposed to just deleting them. I kind of don’t believe that people would rather archive Facebook messages than just delete or leave them as is. But that’s just me.
I’ve also seen people complain that Facebook holds all the data from their text messages. I find this misleading, because “text message” can surely mean any message that is written in text form – digitally or otherwise. So, do they mean messages from Facebook Messenger, or SMS and WhatsApp messages, for example?
I seem to recall that a feature of Facebook Messenger is that you can also use it as your SMS text client, even if the person you’re messaging doesn’t have a Facebook account. If this is true, then seeing SMS messages in Facebook’s data archive makes sense. I didn’t activate this option though and, as mentioned, haven’t directly given Facebook any of my phone numbers, so, in theory, they should only have data on messages sent directly via Facebook and Facebook Messenger.
Side note: as the mobile apps for these are separate, would it make a difference if I tried to access my Facebook Messenger data? What extra information might be stored there?
There are 40 HTML files and separate folders for audio, gifs, photos, stickers and videos.
I’ll go through the folders first…
Wow, a lot of photos and gifs. Thankfully, there were no surprise photos that I’d forgotten about and didn’t want to see again, so I guess if there were any before, deleting them worked. I had a moment of panic as I recognised a picture that someone sent me on WhatsApp, but then I remembered that I forwarded it to a group chat on Facebook Messenger. Panic over.
With nothing else interesting in the folders, I’ll look at the HTML files next.
Okay, so I’ve spotted the transcript for a group chat from a few years ago that I shouldn’t have been added to, and left before deleting. I did this with several conversations and realised that you can’t delete group conversations after you’ve already left them, so they hang in your messages forever – so annoying! A few months ago, this was still the case, but now I see that all these “left” conversation messages have been automatically archived, and still appear in my data download from there. Fine.
…Unless I got annoyed and manually archived them and forgot that I did it… Anyway!
Hang on, I notice a “Back” link in one of the message HTMLs and it takes me to the index.html page, which is structured a lot like a Facebook profile and is a lot easier to go through.
Oops, maybe I should have started with that file. I guess I’ll start going through it that way now.
Another moment of panic as I thought I recognised a message sent via WhatsApp. False alarm – it turns out that I often message my partner from another room to ask him to turn down the TV, so these are different messages.
I’ve been through quite a few of the conversations, and as far as I can see, only the messages sent via Facebook (and Messenger) are included, and the messages I deleted do seem to actually have been deleted. This is good!
Profile and Contact Info
Since I’ve finished going through the folders in the .zip file and I’m now working via the index, I’ll look at the Profile and Contact Info data now.
Hmm. “Previous Relationships”. This isn’t something that I’m bothered about being recorded, but I do find it interesting, since I’ve seen people complain of photo memories and friend suggestions coming up for their ex-partners. If Facebook has this data, I guess they’re just not using it the way users want them to.
Absolutely nothing listed under “Contact Info”. Interesting… My connected email accounts were listed in the “Profile” section, but I’m still genuinely surprised that they haven’t scraped a contact number for me – or at least haven’t stored this as part of my data archive. I wonder if that’s hiding somewhere else instead.
Fairly uninteresting. Text only; no media, comments or likes. I’m not going to spend much time on this. However, looking at some of the first messages I received on my wall (back when it was a “wall” and not a “timeline”), I remember that there used to be a feature to “view friendship”. Given that Facebook makes videos about “friendiversaries”, I’m guessing that they still keep data on relationships between people, but that specific feature doesn’t seem to be available anymore.
I read that removed friends will still be listed here, and I remove people from my friends list quite frequently, so I’m interested to see what old gems I’ll find. I also wonder if this is where all that contact information that people complained about being shared (apparently) without their permission will be stored. But we already know that they don’t have my phone number or access to my phone book, so maybe I won’t see any of that? Let’s find out.
I’m not gonna lie, that was a massive disappointment. All that’s there is grouped lists (current and past friends, rejected requests, etc.) of people’s profile names and the date we became friends on Facebook. Nothing else! Part of me wonders if Facebook went lightning speed at altering the way they store data, since there must be a massive rise in people requesting their data archives right now, or if people are just really overreacting. There’s nothing suspect here. But perhaps there would be more to show if I had given Facebook my phone number and access to my phone book.
The only weird thing is a “Friend Peer Group” that has me as “Starting Adult Life”.
Not really accurate anymore, but okay.
Is it weird that I’m kind of disappointed not to have discovered some shocking stock pile of data here? Don’t get me wrong, of course I’m glad that Facebook (so far) doesn’t seem to have any data on me or my friends / contacts that I wasn’t expecting, but with all the hype and drama I’m seeing around Facebook and Cambridge Analytica, I expected something more exciting and eye opening. Like when you realise for the first time that your email address or password have “beenpwnd” and decide that maybe you do need to start using a password manager afterall.
The “Events” section holds basic data (name, date, location) for every Facebook event I’ve ever been invited to, ever. Seems a bit unnecessary, but there you go. Again, nothing interesting.
Although this section contains a lot of data that I don’t really care about anymore, I can completely see how this happens. Events on Facebook don’t disappear after a certain amount of time. They stay there forever, like posts and photos (this is how “memories” and “on this day” features work). Events also include details of people who attended, were invited, or declined the invitation. If event data disappeared from my record after a certain amount of time, then the invite lists within events would also disintegrate into nothing over time. Self-made Facebook content isn’t designed to expire, so this mass collection of event data, although not specifically desired, can be explained by the way that Facebook events – and other content on the site – work.
Here, I’m reminded of that time when I deactivated my account in uni and a couple of my friends (freakishly quickly) texted me to ask if there was something wrong. Because that’s the only time you leave Facebook, apparently.
A very long list of active sessions and session updates (everything for the last four years, in fact) follows the account deactivation history. There’s no specific location data here, but there are IP addresses that the location can be easily derived from. I’ve had a Facebook account for ten years; I wonder why this data is kept for four years. Perhaps it wasn’t stored at all before 2014.
Some recognised machine, login, and cookie data… A list dedicated to all the collected IP addresses… Password change logs… A lot of “Remove Profile Photo” and “checkpoint completed” entries for some reason… What’s a checkpoint in this context? Either Facebook’s help centre doesn’t know, or it has something to do with advertising.
This part could be interesting, or boring like the “Friends” section.
The first heading is “Ads Topics”, which is basically an aggregation of all my page “likes” (which isn’t that many) and some keywords that seem to have been implied from other data. For example, “beer”, even though I don’t like beer, but I live in Germany now so, obviously. And “textile”, because I’ve “liked” some companies that sell clothes.
“Ads History” tracks all three of the adverts I’ve clicked on – one of which I’m guessing was me dropping my phone on my face, because the title is in another language and I can’t tell what it is.
Then something interesting! “Advertisers with your contact info”. There are seven companies here, six of which I do recognise as having willfully given at least some of my information to, but never via Facebook. I’ve only “liked” two of the seven companies on Facebook. How have they all managed to connect my Facebook account to their customer database?
My guess is because of my profile search settings.
Everyone is allowed to look me up with the email address I provided (including advertisers), and I use the same email address both for my Facebook and my accounts with those six companies.
But what about the seventh? What the Hell is “Brave New Look” and why do they have my contact details? Google tells me that they sell clothes online (they’re different to “New Look”), but I’d never even heard of them before discovering that they seemingly have access to my contact data.
Seriously, who are they? I tried to find a parent company or former “trading as” names, but I don’t see anything useful. Detectives out there, let me know if you have any information on this please.
I never link games or applications to my Facebook account, so I’m interested to see if there will be much of anything for me here. I think I did once connect a Norton security app, but I’m pretty sure I disconnected that years ago. Then there’s the standard Android and HTC Sense links, back when that was a thing and I used Android…
Sigh, I miss you, Android. Signed, unwilling Apple user.
Yup, sparse and dull. Only two applications: “LG Device” and “Norton Safe Web” – probably historical. And that’s it!
Well, that was weirdly disappointing… There was only one suspect piece of data around an advertiser, that I’m pretty comfortable guessing was only displayed amongst my Facebook data, rather than caused by the unethical sharing of my Facebook data. Apart from that, I’m pleasantly surprised about how little data Facebook has about me, if anything.
Let’s look back about the three assumptions and hypotheses I made before exploring my Facebook data:
- Contact data: This is the one I’ve seen most people make the biggest fuss about, but where I was most skeptical about there actually being an unauthorised sharing of data. However, it’s important to note that there is a difference between allowing Facebook to have access to your contact data, and them just sharing it with whoever they want once they have it. Is the latter what’s been happening? I actually don’t think it is, but I can understand why people would be concerned about the possibility. From what I can tell, they don’t just mine data from your phone book without you agreeing to it first.
- Marketing data: I suspected there to be some kind of political profile on me as a minimum, especially given the election influencing chatter that’s going on at the moment. Although I don’t generally tend to share any political views on Facebook, I do “like” and interact with content from selected news and media outlets, and so I suspected there to be some inferred profiling from that. But there weren’t any political interests associated with me at all.
- Location data: This was probably closest to my expectations, with no data specifically collected about my location, but certainly enough clues to work out where I am most of the time. I didn’t see anything shady or underhand though, as I understand how this data can also be used for good things, like monitoring for any unusual activity. You can’t tell what’s unusual without some collection of data over time.
So what does all this mean?
Are Facebook actually the good guys and being completely open and responsible with our data?
I wouldn’t go that far. This is just one small piece in a big puzzle. How Facebook shares our data with third parties, how third parties obtain our data through Facebook, how dark patterns and general opacity regarding privacy and permissions are exploited are all factors that need to be considered.
Am I just lucky that constantly tweaking my privacy settings has actually paid off, and there would have been much more data otherwise?
Quite possibly. As I’ve said, I actively try to prevent Facebook from accessing my phone number, phone book, and location data. My profile is almost entirely set to private and my profile doesn’t appear in search engine results. I always resist the “connect to Facebook” button, even when I’m out of lives! I’m sure that all these measures have helped to reduce the amount of data that Facebook holds about me.
That’s not to say that more information about me isn’t available in someone else’s data archive, who has granted all these permissions. I can take my own measures, but I can’t trust that everyone who has some information about me has done the same.
Do I just have enough knowledge and savvy to know what to expect?
Probably, yes. To me, it seems that there are two things going here:
- I know enough about how technology, advertising, and business work to at least want to protect my data in the first place, and I know how to configure the right account settings with that mind
- I know enough about how technology, advertising, and business work to have a pretty good idea of how my data is used for both good and evil, and that we can’t rely on companies to protect us, or behave completely ethically and responsibly all the time
So yes, I did know what to expect, even with a healthy pinch of suspicion and skepticism.
I’m Not the Average User
Having said all that, I don’t believe that I’m the average Facebook user. I can totally see how lots of users either accidentally share more information than they mean to, or simply don’t understand what can happen once access is granted. Of course, users have to take some responsibility, and there is no amount of transparency or hand holding that can completely guarantee that all users know exactly what they’re agreeing to and what’s going on behind the scenes, but improvements can be made.
I can also totally see how the developers, testers, product owners, etc. at Facebook might genuinely not have seen this scandal coming either (let’s not get into the fact that they use “dogfooding” and don’t actually have dedicated testers). Or conceive of a situation where they did see it, but felt either unwilling or unable to do anything about it. Or where they did see a problem and spoke up about it, but it didn’t change anything.
I see a lot of people playing the blame game at the moment; people who should have more understanding and empathy for both users and creators. I’m not going to get into all that in this post, but I do think there are lessons to be learned on all sides, and that we should should support those who want to make things better, instead of always looking for villains.
But There’s Still So Much Data!
From what I’ve seen, a lot of people are just shocked about the sheer volume of data that Facebook holds. And yes, there is a lot of data, but nothing that I didn’t knowingly share with Facebook; nothing for which I don’t understand how it’s used for the features that Facebook is known for. I’ve been a user for ten years – that’s a long time! Of course that comes with a lot of data.
Would I be surprised to find that my GP practice in the UK has all the data on every appointment I’ve attended and every prescription I’ve had for the last ten years? Of course not! But would I be surprised if Snapchat – who are known for disappearing photos and videos – had kept all the photos and videos I shared since joining? Absolutely. The difference is in why the data is recorded and how it’s supposed to help provide the services that users want and use.
The volume of data that Facebook has doesn’t scare me. However, I am aware of the possibility that they just so happen to use my data for something else, that I didn’t expect or agree to. To me, the fact that Facebook uses my data to provide services that I’ve agreed to isn’t the issue as a stand-alone; it’s what else they’re using it for, that they haven’t told me about…
Do you trust Facebook to tell you everything?