Privacy-enabled scalable recommender systems
Author: Moreno Barbosa, Andrés Darío
Director(s)/Advisor(s): Castro Barrera, Harold Enrique
; Riveill, Michel; Jiménez Guarín, Claudia Lucía; Masseglia, Florent

Publication date: 2015
Content type: doctoralThesis
Keywords:
Abstract:
Electronic content is ubiquitous in our daily lives. Several factors such as the development of Web 2.0 technologies, the increased access to mobile devices and the deployment of mobile networks has undoubtedly augmented the amount of information easily available to users. Given the limited attention span of the user and the extensiveness of the available streams of information ready to be consumed, automatic systems must be available for the user to prioritize, suggest or screen content suitable for the user interests and situation. One of the most popular initiatives created to solve the information overload problem are Recommender Systems [Adomavicius 2005]. Recommender Systems are information filtering systems that use the historical information about the user (what the user has considered relevant or irrelevant on the past, among other information) to build and accurate representation of the user's interests that is used to predict the relevance of a large collection of available items for a specific user. Recommendation systems are used by several online retailers, online content streaming services and social networking sites to improve the user's experience of their services by automatically filtering their content or offers of items to the ones most likely to interest the user. Generally speaking, recommender systems can be classified into two categories: Content Based and Collaborative Filtering. The former category relies on the definition of explicit features that describe the item domain and assigns them weights to describe the affinity between the feature and the item. For example in the movie domain, items can be described by features such as the genre to which they belong, the director, writer and actors that take part in the movie. On the other hand Collaborative Filtering is content-agnostic and relies on correlations between users and items based on the historical consumption patterns between the users and the items. It has been shown generally that Collaborative Filtering methods present better results than Content-Based [Pil'aszy 2009], however due to inherent shortcomings of single approaches, a better predictive performance is achieved by developing a model that integrates different paradigms (Hybrid approaches). To keep their users satisfied, personalization services that operate recommendation methods should present relevant recommendations even when the number of users, items and user-item interactions in the system increase As it will be shown in Chapter 2, the computational complexity of Collaborative Filtering methods for keeping a user profile up to date depends directly on the number of users and items available in the system and the amount of registered user-item interactions. Current large scale personalization systems such as Netflix [Netflix 2013] (a content streaming service of movies and series) has an estimated number of users of 44 million while the number of available items to watch fluctuates around 13000 titles. The number explicit user-item interactions (assigned ratings) is estimated at 5 billion ratings [Schelter 2013]. To account for these large numbers, Recommender System's adopters employ the support of cloud computing frameworks. Recommender Systems are now highly scalable solutions that are able to: (1) gather and store as much information as possible about users and items supported by the current availability of cheap storage, (2) apply computational intensive algorithms to train recommendation models that scale up to the size of the collected data and (3) use the trained models to adequately answer to a large amount of recommendation requests. However, as it will be shown in Chapter 3 the current architecture of data gathering and processing of recommender systems places a conflict with users. Following the definition given by [Foner 1999], privacy can be defined as the ability of an individual to protect the disclosure of personal information to third parties who are not intended recipients of the information. While users trust recommender engines to use their information for filtering or personalization purposes, it will be argued that the centralized consolidation of information increases the likelihood of misuse of the user information, misplacing user trust. A question that arises after this claim is: why information gathered and processed by a recommendation system is privacy-sensitive? After all, due to the availability of personal micro blogging and social networks, users seem avid to share their information with others. Opinions given by users on items reveal at a great extend the personality of the user, opinions on items might reveal political inclination, sexual orientation, physical or mental treatments the user is taking or religious inclination of a particular user. An iconic case of how important are personal opinions on items in recommender systems came with the Doe Vs Netflix class action lawsuit [Singel 2009], when ratings from users were de-anonymized after being made public by the Netflix Prize Competition [Netflix 2009]. The plaintiff claimed that: "...information tending to identify or permit inference of her sexual orientation constitutes sensitive and personal information. She believes that, were her sexual orientation public knowledge, it would negatively affect her ability to pursue her livelihood and support her family and would hinder her and her childrens? ability to live peaceful lives within ...(her)...community." The aim of privacy-enabled recommendation systems is to give users tools to protect their privacy and keep the choice to themselves if they want to reveal their information