Libraries & Leaky Data, Part 1

The ILA Best Practices Committee has recently been tasked with studying the issues of patron privacy around the use of printed hold wrappers in public areas. It is good to see a focus on the most obvious aspects of protecting patron’s privacy since having a patron’s full name stuck on a book in a public area is just an outright problem when you think about it. If we attempt to square this practice with the widespread acceptance that a patron’s reading habits and their history of checkouts must be protected from other prying eyes (such as government agencies or various Freedom of Information requests) we see the difficult balance between providing convenience and adherence to privacy. There are however, other areas within library services where the patron data being “leaked” is not as easy to see as a hold wrapper printed with a patron name. These sources of data leaks can be found within the software ecosystem used commonly throughout libraries.

What are these potential sources of library data being leaked? Below are some of the more widely used pieces of library technology which potentially have your library patron data or require accessing your patron data at some point within their functions.

Integrated Library System
Discovery OPAC
Self-checks
Computer Reservation
Print Stations
Automated Material Handlers (AMH)

All of these software systems either by design or through its back-end structure may collect patron information within their databases or software logging process. These systems can run for years, quietly collecting data, as they sit somewhat inconspicuously on the library network.

The worst culprit within the library software ecosystem for leaking patron information into your library network is the Standard Interchange Protocol, otherwise known as SIP2. The widespread use of SIP2 was due to our need for standardization of data exchanges between library software systems. This led somewhat innocently to the SIP2 protocol being used far and wide in library technology. Nearly every software vendor that wants to sell a software services to any library will use or work with SIP2. Any library software service that queries the ILS can do this through the use of SIP2, so the adoption by libraries of SIP2 on their networks is near universal.

How bad is SIP2 in terms of data security? Pretty bad in terms of how it is typically deployed “out of the box” within library networks. Here are the 10 patron fields of information shared within a single SIP2 patron authentication query.

User’s barcode
User’s PIN/password
User’s full name
Address
Email address
Phone number
Birthdate
Gender
Age category
Fines owed

A single query to see if a patron can gain access to a library computer or to a service will send all 10 fields of patron across the network regardless of only needing to verify if the patron is in “good standing.” It doesn’t matter if the service only needs to see one of the fields: SIP2 sends all 10 fields of data in response to a query.

With the widespread use of SIP2 protocol within library networks and the preponderance of various systems within the library such as multiple self-check stations or print stations, all of which likely use SIP2 to talk to the ILS, you have a lot of patron data being sent around the library network. Making this problem worse, all of these data fields are sent in plain text, which includes the patron’s PIN/password. Many systems software logging processes will save every SIP2 transaction into a file that can easily have hundreds of patron’s passwords and potentially thousands of transactions showing a patron checking out an item. These computer stations typically utilize local logging or small-scale databases for the purposes of providing libraries statistics on usage at the individual stations. Unless active measures are undertaken to purge logs and remove data collected, libraries have patron data stored throughout library desktops and servers beyond the typically more secure ILS.

It is usually at this stage of describing the problem where there is some questioning on the severity of the issue. Some folks will minimize the likelihood of this data getting hacked or stolen from the library network. Or they will take solace in the library being a small, unworthy target for any malicious intent. While it is true we have been largely helped by the fact we are a small, perhaps less juicy, target of a data hack, the network data attacks have now reached a more ruthless level. These ransomware attacks simply do not care who their target is and go through an automatically scripted series of software exploits to hijack any computer or server and steal/password encrypt its local data for ransom. This has occurred at the National Health System in the United Kingdom, dozens of countries government networks, and more recently the Baltimore City’s servers. It has even happened to public libraries.

We can no longer sit idly by and wait for the data to be stolen under this scenario and the ensuing PR and financial liability nightmare to befall us. If this were to happen to a library, wouldn’t it be better to know that the only source of patron data available was at a single point on your network rather than dozens? Or that it was understood precisely where this patron data resides and to take better efforts to protect the data on that device? Over a series of blog posts, I will outline the steps to take to help libraries understand the network protocols, ILS configuration strategies, network design, storage and logging that should be considered when undertaking an overall audit of your library’s leaky data problem.

Leave a comment Cancel reply