Note: I am not the sole author of this post! I have written this article in collaboration with my colleagues at Megagon Labs. See the original post.
The KDD (Knowledge Discovery and Data Mining) conference is the oldest and arguably the top data mining conference in the world. This year’s conference, which was planned to take place in San Diego, became an all-virtual conference due to the covid-19 pandemic. I was certainly looking forward to meeting old friends and making new ones at KDD, and while Zoom socials and messaging apps are helpful, they don’t quite replace the in-person interactions. Although we all missed the sun and the beach, we still got a well-organized KDD conference which was bigger than ever. There were roughly 210 accepted research papers, 32 workshops, and more than 40 tutorials including a tutorial by our own wonderful colleague, Estevam Hruschka, on “Data-Driven Never-Ending Learning Question Answering Systems” which you can watch here. Given the magnitude of the conference, it’s kind of impossible to summarize everything. Nevertheless, I’m going to try my best to share some of the latest trends and mention some of the papers and talks that I’ve really liked along the way.
KDD has always been a popular venue for papers in the areas of graph mining, recommender systems, and text mining. This year’s KDD was by no means an exception with the majority of papers on graph mining, a paper on evaluating recommender systems (which I highly recommend) winning the best paper award, and multiple tutorials on text mining. While the practical problems that the KDD community cares about haven’t changed significantly, there is a significant shift in how these problems are addressed. There is a clear rise in relying on machine-learning techniques, and specifically deep neural networks to delve even deeper into many of the field’s fundamental problems. To put things in perspective, 16% of papers in this year’s KDD mention “deep” or “neural” in their title. The same number for paper in KDD 2016 (merely 5 years ago) was only 3%.
Graph mining, with its numerous applications, has seen rapid strides with the emergence of Graph Neural Networks (GNNs). A large number of papers in KDD 2020 explore how graph representation learning (through GNNs) can be improved to tackle classic problems such as node classification, link prediction, and graph classification (see PolicyGNN and TinyGNN). One of my favorites this year was a paper titled GPT-GNN, which shows how GNNs can be pre-trained through generation (of edges and node attribute). To me, the paper is a clear example of how fast ideas across different disciplines (e.g., vision and NLP) are being exchanged. Similarly, techniques based on pre-training and transfer-learning techniques are becoming more and more common for text mining. This paper from Amazon and CMU, is a good example of using transformer-based models for multi-label text classification. Given all this, it’s not surprising that KDD has started organizing Deep Learning Day since 2018.
Similar to last year’s conference, KDD 2020 also organized an Earth Day as well as a Health Day to bring researchers and practitioners in these disciplines together. I very much liked this talk in the Earth Day event on how the number of unreported cases in a pandemic can be estimated. I have to admit that 2020’s global pandemic and California fires made me appreciate research in these areas more than ever. Speaking of events focused on applied research, KDD 2020 had a number of invited speakers from the industry including our own head of research Wang-Chiew Tan who talked about the power of subjective data. If you are interested you can read more about subjective data here.
Although KDD’s virtual conference portal is no longer open, you can still gain a lot from this year’s KDD. A great number of talks and tutorials are available on YouTube and Vimeo, and you can browse all the papers and review their abstracts through PaperDigest. KDD 2021 is going to be in Singapore and I’m looking forward to seeing what’s coming next.