Tao is a Lead Data Scientist at Agoda in customer-facing LLM products and ranking and recommendations. Previously, he was a machine learning researcher at Twitter in responsible AI / AI ethics team.
I am a Lead Data Scientist at Agoda in customer-facing LLM products and ranking and recommendations. My work spans from analyzing raw data, creating data pipeline, developing machine learning models, to deploying models to production with engineers. I also strategize and communicate with managers and product owners to plan and drive team quarterly milestones.
My research topics at Twitter includes Fairness in Machine Learning and Human-Centered Computing (HCI). The work involves identification, evaluation, and mitigation of ML model bias, developing new rigorous statistical analysis including for uncommon situations, providing guidelines for company-wide engineers to utilize fairness metrics, and communication internally, cross-functionally, and externally through company blogs and academic publications.
I graduated from Algorithms, Combinatorics, and Optimization (ACO) PhD program at Georgia Institute of Technology, based in School of Computer Science. My research interests include machine learning algorithms, combinatorial optimization, differential privacy, and fairness in machine learning. I am grateful to be advised by Mohit Singh. Our current research has been finding better (randomized and deterministic) polynomial-time approximation algorithms for optimal design problems in statistics. The first algorithmic work is joint with Aleksandar (Sasho) Nikolov and Vivek Madan. My colleages Vivek Madan, Mohit Singh, Weijun Xie and I also first prove the theoretical guarantee of commonly used heuristics, namely local search (Fedorov exchange) and greedy algorithm in this work .
In addition, I work on differentially privacy. The first project is on growing databases with Rachel Cummings and Sara Krehbiel. Part of the work was presented at TPDP2017. Rachel and I also is a part of the team winning first prize and people's choice award ($20000 total) for NIST's privacy challenge . Our proposed solution is by differentially private generation of synthetic data via GANs and is presented at TPDP2018. More recently, I am exploring the practicality of employing differential privacy in training large deep learning models (during summer 2019 internship, hosted by Janardhan Kulkarni and Sergey Yekhanin).
My other direction of work is fairness in machine learning. In particular, my coauthors and I defined a notion of fairness in PCA (the blog on this notion) and proposed algorithms for two groups. We later improve the result to solving fairness over multiples groups and more general objective. The algorithms have proven theoretical guarantee and scale to large datasets. The implementation is publicly available on Github.
During my undergraduate, I studied mathematics at University of Richmond. My undergraduate research was in discrete geometry and algebraic combinatorics, mainly on bent functions (coding theory) and partial different sets.
My undergraduate thesis, under the supervision of James Davis, was in the area of algebraic combinatorics, coding theory, and discrete geometry, specifically on Cameron-Liebler line classes and partial difference sets, which includes a new non-existence result of partial difference sets in a certain class of abelian groups.
I am originally from Bangkok, Thailand. I graduated high school from Bangkok Christian College. During the middle and high school period, I was involved in and very much enjoyed national and international mathematics competitions and many serious training that came with those.