تفاصيل العمل

Islamic scholars used to use an average large encyclopedia called “Shamela”, it contains most books they need to be written in the Arabic language in order to do their researches.

This pedia is used by a desktop app that contains many features, one of them is the ability to export all its e-books (~20K) to HTML or Word documents.

An academic research has started in 2017 at the department of NLP at my university, I’ve contributed building a special scraper for the offline pages that are exported by the media app in order to prepare a ‘clean’ well-prepared dataset using regular expression and CFG rules, this CSV dataset can be used by NLP researchers for parsing, tokenizing, stemming and lemmatizing Arabic words.

بطاقة العمل

اسم المستقل Abdulkader K.
عدد الإعجابات 0
عدد المشاهدات 326
تاريخ الإضافة
تاريخ الإنجاز

المهارات المستخدمة