While the popularity of social networks was increasing extremely over the last decade, companies decided to use them as a platform to advertise their products. There are many techniques for analyzing how the users consume social media content, but less approaches focusing on the user content itself and its creation.
Social media content carries a lot more information than it actually looks like. Bad experiences with airlines, rants about an ice cream manufacturer or just the recommendation of a restaurant – social media content is very judgmental. Analyzing these implicit and explicit sentiments can reveal the public opinion about specific entities.
Therefore, I took the challenge to write my bachelor thesis on “Sentiment Analysis of Social Media Content with SAP BusinessObjects Design Studio”. Together with the team at graphomate I created a prototypical software extension for SAP BusinessObjects Design Studio. It analyses the sentiment of Twitter content and provides a sentiment score that can be used in other Design Studio components.
Tweets are most suitable for this concept because they often are very judgmental and because they can be retrieved using a convenient application programming interface. The implemented sentiment analysis algorithm examines all words of a sentence. Then it looks them up in a dictionary that contains information about the sentiment of common judgmental words. The sentiment score of all found words ranging from -5 to +5 will then be added up to the sentences’ overall sentiment score: excited (+3) + fresh (+1) + interesting (+2) = +6.
I implemented the prototype of this concept using the Design Studio data source extension SDK. The extension serves as a link between Design Studio and the Twitter Services. It requests Tweets about a specific entity and condenses their sentiment in one single score that represents the public opinion about that entity. By using an input dataset that contains several entities (e.g. product names) a sentiment score for each of them can be calculated. The input dataset can be selected from any other Design Studio data source. The output then will be a copy of the input dataset enriched with the overall score of each entity. It can be used like any other data source in Design Studio.
The enriched dataset of a sample data source containing movie information looks like the following (negative score is bad, positive is good):
The enrichment process combines already known data of specific entities with information about the public opinion. By using this enriched data for visualizations correlations and outliers can be identified. Therefor our new component, the graphomate bubbles, is most suitable. It displays up to 5 data dimensions in a Bubble Chart or 2 dimensions in a scatterplot.
The following visualization is such a scatterplot and displays the public opinion in relation to the budget of movies. The features of our graphomate bubbles extension will be revealed in one of the upcoming blog posts.
This visualization definitely reveals a correlation between the public opinion and the budget. Movies which had a higher budget seem to also get a better public opinion. Moreover, the scatterplot simplifies the identification of outliers lying in the upper left or bottom right quadrants.
Next to the documentation of the prototypical implementation my bachelor thesis also addresses the Design Studio SDK, social networks and different sentiment analysis approaches. Furthermore, it reveals pros and cons of the prototypes’ concept.
In the end I would like to thank the graphomate Team for letting me work on this interesting topic and for helping me out with tips and tricks. I enjoyed the time in the office and I am very happy with the result.
This file is licenced under the Creative Commons-Licence.