ceClub: The Challenges of Mining Machine-Generated Web Mail

Liane Lewin-Eytan (Yahoo Research)
Wednesday, 10.5.2017, 11:30
Taub 301

In the last decade, Web mail traffic has evolved, very much like regular snail mail, into being dominated by machine- generated messages. Some recent studies have verified that more than 90% of non-spam Web email is indeed generated by automated scripts. Although generated by machines, a large part of these messages include highly personal information, e.g. bank statements, travel plans, or shipment notifications. In this presentation, we will provide an overview on how machine generated traffic has changed the nature of Web Mail. Then, we will focus on the critical issue of privacy, which arises in various Web mail debugging and quality assurance activities applied to machine-generated messages.

Specifically, we study the problem of k-anonymization of mail messages in the realistic scenario of auditing mail traffic in a commercial Web mail service. Mail auditing is conducted by trained professionals, often referred to as "auditors", who are shown messages that could expose personally identifiable information. We address the challenge of k-anonymizing such messages, focusing on machine generated traffic. We describe the model and process we applied over actual Yahoo mail traffic, and demonstrate that our methods are feasible at Web mail scale. Given the constantly growing concern of users over their email being scanned by others, we argue that it is critical to devise such algorithms that guarantee k-anonymity, and implement associated processes in order to restore the trust of mail users.

Liane Lewin-Eytan is a Director at Yahoo Research, managing the Mail Mining Research team since 2014, with the mission of analyzing and modeling mail data, to get new insights and devise novel mail features and applications. Prior to this, during the years 2009-2013, Liane was a Research Staff Member at IBM Research in Israel, where she worked on smart grid networks, cloud networking and network virtualization. She received her B.Sc, M.Sc, and Ph.D. from the Dept. of Electrical Engineering, Technion - Israel Institute of Technology, Haifa in 1999, 2001 and 2008, respectively. Liane has served as PC member in several leading conferences during the past years including WSDM and CIKM, and has over 30 publications and US patents. Her publications during the last years relate to various fields in the Web Mail domain, including mail data clustering and classification, mail data anonymization, mail smart actions and mail search.

