CYBERSECURITY SERIES: DATA ANONYMISATION
With the increasing concerns around the data held by organisations, questions around how the new GDPR legislation will be interpreted and the impact that this could have on both customer and supplier, many organisations are turning towards anonymisation to reduce the impact of any breach that could occur.
WHAT IS ANONYMISATION AIMING TO ACHIEVE?
Anonymisation processes separate the identity of the data subject from their personal data. Although ‘anonymous’ is the term a layperson is likely to be familiar with, there are actually two different types of processes used to stop people from identifying individuals who contributed to a dataset:
Anonymisation: the name and contact details of the data subject are irretrievably removed from the dataset, often at the same time that data is aggregated, so that the dataset no longer contains personal data. The confidentiality of the dataset still adds commercial value to the business, but the reputational risk of a data breach is removed. It no longer constitutes “personal data” and thus GDPR no longer applies.
Pseudonymisation: the name and contact details are held separately to other personal data. The dataset is reversibly anonymised – names are replaced with unique identifiers and no aggregation takes place, so that individual entries can still be accessed. This information, according to the GDPR, still constitutes “personal data” but is nevertheless seen as a good thing since it is a good data minimisation strategy.
So the quality of anonymisation depends on how well the original record is obscured. Organisations such as healthcare providers, who need to keep sensitive records and know who they refer to, might choose pseudonymisation as a means to increase security. Organisations, such as social media platforms identifying trends, might sell data products that are fully anonymised and so not considered personal data (they remove the security risk using an irreversible process).
Types of data where pseudonymisation would increase security:
- Medical records
- Research data
- Anything where the value of the data is in the detail. Its users need to be able to look at a specific record in comparison to the dataset as a whole, or want raw data to analyse.
Types of data where anonymisation would decrease risk without decreasing data value:
- Trends for sales and marketing
- Cyber threat information
- Anything sold as 'intelligence' products, where value is added through analysis before its sale.
Types of data that have to remain personal and identifiable:
- Marketing lists
- Customer profiles used for targeted cross-sales
- Anything where the value is in the ability to contact or make a judgement about, or take action in relation to, a specific individual.
WHAT IS AGGREGATION AND WHY IS IT SO IMPORTANT?
Let’s assume someone removes the name and full address of a data subject from a database so that they can anonymise it. There are no other unique identifiers (national insurance number, passport number, driving licence number etc.) in the dataset.
But... The dataset probably still holds personal data. Why? Because the more data points (separate pieces of information) you have about a person, the more likely you are to be able to identify them without their name and address. The volume of information you have about a person dictates how unique they appear.
By aggregating data – generating a database of statistics about different sets of information, rather than retaining the original record – it's harder to trace information back to its source. The data is generalised, still useful, but information in each separate entry is hidden.
Of course, the way that the data is generalised has an impact on whether or not the database still contains personal information.
Let’s consider a retailer dataset, originally including name, address, age, payment details and purchase information. They want to sell information to partners. The retailer decides age and the first half of the post code (designating a part of a city, but not the specific street) will be general enough. Their customer can see hundreds of 27 year olds buying alcohol and disposable barbecues, but only one 102 year old buying tinned soup and personal hygiene products.
What the customer wants to know is that advertising Nurofen and Piriteze in their shop window will have more impact than marketing Tena. However, the way that the data is anonymised means that some individuals can be identified because they are unusual.
One or two data points can be used to identify someone if they're not carefully anonymised. Aggregation makes it less likely that individuals can be identified, as well as potentially adding value by providing an analysis or answering a specific question asked by a customer, but it's an imperfect process.
In summary, aggregation may not amount to true anonymisation.
The problem with considering something anonymised is in the definition: if you can't identify an individual (either directly or indirectly from the relevant information) it's not personal data and so it doesn't require any security at all (at least from a GDPR perspective). In many instances, for example data.gov.uk, data is anonymised with the purpose of making it public. Financially it makes sense – government spends millions per year on research and without public datasets much of this funding is spent collecting/duplicating datasets.
The problem is that, while data aggregation prior to publication can increase the anonymity of a dataset, aggregation of multiple datasets after publication (coupled with high speed computing power) make it possible to reintroduce the identity of individuals.
- If an insurance company can ask new customers for enough information that's in a public anonymised medical dataset for them to match names to a medical history or genetic disposition.
- If a hacker can match records across datasets so that they have enough personal details to dupe someone into making a mistake, or know how to answer every password recovery question a bank might ask.
USING ANONYMISATION TO MITIGATE CYBER RISK AND DEMONSTRATE PRIVACY BY DEFAULT
Pseudonymisation is an effective way of reducing the attractiveness of a dataset to an attacker by making it more difficult to link a record with its subject. The more complete and accurate the data is, the greater its value to a malicious party, so pseudonymisation can reduce cyber risk. Further, GDPR requires organisations to take appropriate technical and organisational steps to ensure the security of personal data – pseudonymisation is explicitly identified as achieving that aim (provided that the key to reverse the process is itself kept separate and secure)
Given our earlier discussion, anonymised data is tricky to manage. Complying with existing anonymisation standards won't mitigate the risk of someone obtaining a dataset and deanonymising it. It would provide evidence of due diligence if a data breach had to be reported due to deanonymisation, but avoiding a fine doesn’t mitigate the reputational damage a data breach also creates.
If in doubt, carefully implemented pseudonymisation is great, but assuming permanent anonymisation is risky. Playing it safe and considering a dataset to have been pseudonymised, but not anonymised, when measuring the cyber security requirement is more likely to reduce the risks of reputational damage and loss of customer goodwill.
It’s usual for businesses to mix up the definitions of anonymisation and pseudonymisation when they discuss ways to make their data less of a target for hackers.
Removing information that is of limited financial value to the business and could negatively impact the data subjects in the event of a breach is a legitimate security measure. However, before you assume that it’s impossible to identify any individual in the dataset you hold, think about the edge cases. Does your assumption hold true for the unusual people you hold information about? Would it be safer to assume that the dataset has been pseudonymised and still needs some security to comply with GDPR?
ABOUT OUR CYBERSECURITY SERIES
Clayden Law has teamed up with technical expert, Emma Osborn. and over the next few months we will provide some back-to-basics analysis of the technical, legal and data protection issues surrounding cybersecurity, aimed at organisations’ non-technical decision-makers. Together, we’ll be highlighting key cybersecurity and data privacy fundamentals and looking at the interplay between law and practice in this area. For more information, click here.
Please be aware that these notes have been compiled for general guidance only and should not be considered as specific legal or technical advice.
Piers Clayden, firstname.lastname@example.org
Solicitor & Director, ClaydenLaw
© ClaydenLaw 2018