HiEd-Sent

HiEd-Sent represents the first manually annotated Serbian language corpus in the domain of higher education. The corpus was collected from the Serbian teaching staff review website “Oceni profesora” (www.oceniprofesora.com). In addition to the reviews structured information from the website were collected, Figure 1.



Figure 1. Illustration of a teacher's page from the website "Oceni profesora"



Developed framework for downloading and preprocessing the reviews is described in details in paper. Corpus comprises 3.863 reviews with 6.896 sentences.

Information rich annotation scheme, which was used for manual annotation of the corpus, allows to assign to each annotated opinion: aspect, sentiment polarity, sentiment intensity, and when applicable sentiment expression (positive or negative), negation keyword, and negation scope. Aspect represents the target of the opinion and it can obtain one of the following values: Professor, Lectures, Helpfulness, Course, Materials, Organization of a course, and Other aspect (when the content is not related to the domain of higher education). Sentiment polarity can obtain one of the following values: positive, negative, and neutral when sentiment intensity is not assigned. Sentiment intensity can be strong (value 3), medium (value 2), and weak (value 1). Sentiment expression is positive or negative word or expression which conveys sentiment.

Four annotators, with adequate education, were engaged in annotation process. They followed detailed guidelines throughout the annotation and the sequential annotation steps presented in Figure 2. Serbian version of the annotation guidelines is available for download.



Figure 2. Annotation steps



An example of annotated text is illustrated on Figure 3. Annotated corpus is available upon request for the purposes of furthering research, not for commercial use. Please contact us on the following email addresses: oliverag@ef.uns.ac.rs or ogrljevic@gmail.com.



Figure 3. Example of annotated text in Serbian language





To learn more about our research, please refer to the following publications: [PhD thesis], [introductory paper]