Twitter for Public Health

Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications. Vast amounts of freely available, user-generated online content, in addition to allowing for efficient and potentially automated, real-time monitoring of public sentiment and informedness, allow for bottom-up discovery of emergent patterns that may not be readily detectable using traditional surveillance methodologies such as pre-formulated surveys. Twitter offers a number of key benefits as a data source for public health surveillance. First, the dataset is large and readily accessible: as of 2012, 340 million tweets are posted daily and this content is freely available (albeit subject to legal restrictions on redistribution). Second, data may be automatically collected and analyzed in real time. Third, Twitter content is user-centric, thus reflecting trends that surveys may not capture or that users may not discuss in more formal contexts. Finally, Twitter demographics allow for greater representation of underserved and difficult-to-reach groups: African-American, Hispanic, younger, and urban populations are in fact overrepresented on Twitter relative to the general population. NLP serves a vital role in unlocking vital surveillance knowledge from noisy Twitter streams.