I'm Zhizhen (Cathy) Cai. I graduated from University of Science and Technology of China.
I'm looking for a PhD position in Computer Science. My interested topics mainly cover traditional system research, include database, parallel computing and programming language etc. I'm also a fan of combining techniques from different areas to create practical tools like An ROBDD constructor and SAT/unSAT solver.
My past research focused on the significance of big data applications and their evolving requirements on systems and infrastructure as key driving forces.
VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity
Q. Zhang, S. Xu, Qi Chen, Guxin Sui, Jiadong Xie, Zhizhen Cai, et al.
17th USENIX Symposium on Operating Systems Design and Implementation (OSDI'23)
My work focused on design & implementation of planning subsystem for queries over high-dimensional indices: To estimate and compare the cost of different query execution path and choose the optimal one to minify latency. I also helped on misc stuffs including profiling & tackling with hotspots in code and etc.
My bachelor degree's thesis, summarizes and extends my work in VBASE
I joined Big Data Analysis and Application in my senior year and found LLM and AI does not interest me much (but they're important, right?). I was discouraged from research for quite a while until I got interested in system topics.
Later, I had my internship at Systems and Networking Research Group in Microsoft Research Asia (MSRA), co-advised by Qi Chen and Qianxi Zhang. I had my best time in undergrad there. Their ☕ and 🍪 were really enjoyable. This experience changed my mind and I decided to study further in system area.
My work in MSRA mainly focused on investigating existing high-dimensional indices and providing a good runtime query-planning subsystem for our high-dimensional queries. I spent weeks looking into PostgreSQL(PG)'s source and got impressed by the lack of documentation of recent features (some only explained in test code and their IRC channel). The result was satisfying as the end-to-end query time was almost optimal.
Intentionally blank paragraph for my next research experience :3