Digesting Tweets to Understand Obesity

By Samantha (KNE Intern)

Obesity rates are rising across America and causing massive issues, ranging from health complications for the obesogenic population and financial issues for the nation, as healthcare costs skyrocket. One potential cause of the rising presence of obesity in America is the increasing popularity of new technology and social media websites. Ironically, social media is now becoming a tool researchers use to curb the spread of obesity.

I conducted a study during the summer of 2016 to look for a correlation between the contents of people’s Twitter Tweets and the presence of obesity in New York State counties. This research treated obesity as a communicable disease, or a disease that can be transmitted from person to person. In this case, the “disease” spreading was obesogenic behavior, and Twitter was a tool to track that behavior.

I downloaded and sorted Tweets using Python, a popular computer coding language. The Python computer program I coded first separated over 1,000,000 Tweets by location, with each Tweet falling into one of the 62 New York State counties. Then, I separated the tweets into 4 different categories, “Eating Related Tweets”, “Physical Activity Related Tweets”, “Inactivity Related Tweets”, and “General Health Related Tweets”, based on keywords.

Each category had a unique list of keywords, which I generated by isolating the most frequently used obesity-related terms in the set of Twitter Tweets downloaded. A Tweet fell into one of the 4 categories if it contained one of the respective keywords. For example, some of the keywords for the “Eating Related Tweets” category were “cake”, “hungry”, and “mcdonalds”. I then analyzed the percentage of tweets in each category in hopes of finding a statistically significant correlation and determining the marginal effect of the social media data.

The results are currently inconclusive, as the relationship between obesity and Twitter data was statistically significant in some counties and not others. I will be building upon this research in the summer of 2017 in hopes of improving some of the research methods used, like the data collection, statistical analysis, and keyword generation. Hopefully, when fully completed, the research will accurately depict the relationship between obesity and Twitter data, which then can be used to create an epidemiological model demonstrating the spread of obesity in New York State on a county-level basis.

%d bloggers like this: