The Data and Infrastructure Dinner took place at Searcys at The Gherkin on 5 December 2017. It was a chance for members to network, and discuss data infrastructure challenges and solutions.
The topics discussed at the dinner included:
The dinner was conducted with our partners Veeam and Hewlett Packard Enterprises. Veeam are leaders in data recovery. Their services include provision of full data backup and round the clock data availability. HPE needs no introduction, but attended in the context of their wide-ranging data services.
Data and Human Error
The 1932 photograph ‘Lunch atop a Skyscraper’ depicts several construction workers sat on a beam suspended hundreds of metres up, eating sandwiches. Today, the picture raises eyebrows. There’s no safety ropes, no harnesses. It’s dangerous – why do they look so relaxed?
As we build a new kind of infrastructure, our children may look at today’s data practices with similar concern. Organisations treat data in ways which are sloppy and unsafe. There’s a massive exposure to risk of human error, and the consequences of a mistake could be grave.
Possible examples abound: an employee downloads data onto a personal laptop for use later, where it remains after the subject requests that they are forgotten. Failure to update to the most recent patch leads to a massive data breach and the company is sued. A company does not take adequate action to prepare for a power outage in a HQ; the data they need to undertake their service is unavailable and they cannot resume until the power is back on in that building.
These are challenges which range across businesses. But the design of infrastructure is the area best situated to provide some solutions.
Safe by Design
The challenge for data architects is building architectural solutions to human problems. Architecture can make policy literal/physical and prevent human error.
One early and obvious step is to not stratify the data over hundreds of different data centres or lakes; rather, storing in one or two central reserves, and making those reserves impregnable to breaches. This is the case even if different departments may wish to use different application layers on top.
This enables a better control of where any given data is. All departments should be able to access the data, but downloading the data onto various pieces of hardware or spreadsheets outside of the main store should be difficult or impossible. Where possible, data should be wrapped in the kind of smart code wrapper which can destroy it when it is taken externally to the main store. This is the way to achieve the right to be forgotten. Ways of backing up data in which simultaneous deletion of a subject’s personal data can occur must also be found.
As for good security, guidelines must be impressed upon the whole organisation about where you can store which data. The person responsible for patches and security updates must understand that they will take responsibility if a data breach or loss occurs due to negligence. Customers should not be given enough space to provide unstructured personal data; avoid text boxes in which people can enter phone numbers and similar info when not explicitly requested.
For shadow IT projects, management must impress the seriousness of creating data stores which are extra to the main store and are unknown to the organisation. But for their part, management must commit to making the main data store, whether that’s a cloud, a data centre, or a data lake, as easy to use as possible. Otherwise, employees and customers will find a way of climbing over the safety fence.
Mitigating Risk by Design
One factor intensifying the risk comes into effect in May: GDPR. After the compliance process, many businesses will be better organised to invest in understanding their data. But GDPR is a demanding piece of legislation. It asks businesses with sophisticated IT departments to reach the top of their capability in complying. It follows, therefore, that many companies will be left behind.
One very simple way to mitigate risk is to delete irrelevant data. Marketing is a ‘legitimate interest’ but there’s quite some consternation over how much data retained from a pre-GDPR era companies should retain. (Technically you need to have acquired all data to a GDPR standard including retrospective data; however, marketing is a ‘legitimate interest.’ We advise you seek a lawyer for specific compliance advice).
A lean data policy mitigates short-term risk; however, this shuts out many of the advantages machine learning could soon provide. Machine learning should spot trends which you wouldn’t think to test for; but it will not be able to do so if there’s no data to test.
A better solution is to make certain the customer is adequately consenting and then to invest to make the data extremely secure. And further, to draft justifications which will explain the possible utility of that data in the case of a lawsuit.
Many small businesses using a public cloud will simply hope that their provider is compliant with GDPR. (“After all” they might reason, “Amazon is very big.”) It rationally follows that AWS will not lose the data and they will be compliant. Operators of public clouds, safeguarding the data, have levels of power analogous to banks’ with amount of stored value. There is some truth in this; but where possible, it is important for companies which collect data to take responsibility for it. After all, both the processor and the controller have legal responsibility for the other’s actions.
Customer Service by Design
One of the common cries of UX experts is that customer journeys should be simple. The paradox here is that simplicity requires complexity – something straightforward for the customer requires complex machinations and contortions on the vendor side.
When data of this nature became a live issue several years ago, many of the people responsible thought of it as a purely technical issue. There are some technical solutions, such as those our partners provide, but since business has come to rely on a layer of data and analytics members have learned that this is an organisational and operational challenge.
It is a challenge which concerns process and how a business is run. It is a challenge which concerns communications with customers and how to deliver a fantastic UX.
It is mainly a challenge about infrastructure, whereby the physical limitations of what employees can do should max the process and legal guidelines set out by management and the law. Having a flexible infrastructure, which can cope with the sheer weight of data, and which is agile enough to meet the demands of policy, customers, and the law, is now a requirement. Organisations must do the best they can.