113444a Data Mining und Mustererkennung
Zuletzt geändert: | 07.04.2020 / Maucher |
EDV-Nr: | 113444a |
Studiengänge: | |
Dozent: | |
Sprache: | Deutsch |
Art: | V |
Umfang: | 4 SWS |
ECTS-Punkte: | 4 |
Workload: |
Versuchstermine 8 Termine zu je 4 SWS = 24 Zeitstunden Vor- und Nachbearbeitung der Versuche 8 Termine zu je 8 SWS = 48 Zeitstunden Einführungsveranstaltungen 5 Termine zu je 4 SWS=15 Zeitstunden Vor- und Nachbearbeitung der Einführungsveranstaltungen 5 Termine zu je 8 SWS =30 Zeitstunden Summe: 117 Zeitstunden |
Inhaltliche Verbindung zu anderen Lehrveranstaltungen im Modul: |
Sowohl im Data Mining als auch im Natural Language Processing (NLP) werden Methoden aus dem maschinellen Lernen (Teilgebiet der Künstlichen Intelligenz) angewandt um in großen Datenmengen Muster zu erkennen. Die Muster repräsentieren neues Wissen, das in den Daten verborgen ist. Im Natural Language Processing sind die Daten, auf welche die Verfahren angewandt werden natürlich-sprachliche Texte aus Dokumenten oder aus dem Web. Im Data Mining Praktikum besteht diese Einschränkung auf die Datenart nicht. Ausserdem steht im Data Mining Praktikum die selbständige Implementierung der Verfahren im Vordergrund, während die Vorlesung Natural Language Processing eine umfassende Vorstellung aller relevanten Prozesschritte des NLPbietet. |
Prüfungsform: | |
Beschreibung: |
In this course 6 different data mining and pattern recognition
applications are implemented by all student groups. A group
contains at most 3 students. The implementation of each application
should be done within one afternoon (14.15h-17.30h). The applications are: Data Mining Process: In this
exercise the successive steps of the Data Mining process in general
are demonstrated. The Open Source Data Mining toll Weka is applied
for supervised and unsupervised learning algorithms, for data
preprocessing routines and for performance evaluation.
Recommender Systems:
Recommender Systems are applied in E-commerce for generating
customized recommendations. Well known are the Amazon.com
recommendations which are either distributed by e-mail or presented
on the Amazon web page after login. For generating these
recommendations the products which have already purchased or
reviewed by the user are taken into account. In this exercise the
currently most popular algorithms (Collaborative Filtering) for
generating recommendations are implemented, tested and
analysed.
Mining Data from Amazon.com:
Using the Amazon Web Service (AWS) one can access loads of product
and review data from Amazon.com. In this exercise we integrate the
data in our programms using a python wrapper for AWS. Then we apply
various intelligent algorithms for mining interesting knowledge out
of this data. E.g. we perform trend analysis or predict price
models. Students are free to develop their own data mining
applications.
Spam Filter: A Naive Bayes
Classifier is implemented for filtering spam. It is also shown how
to apply this algorithm for document classification in
general
Document Clustering: In this
excercise a large amount of RSS-Newsfeeds is collected. All
articles coming from the different feeds are clustered using
non-negative matrix factorisation. Essential features of each
document cluster are extracted
Face detectionThe
eigenface-approach for face recognition is implemented and tested
in thies exercise
All applications are implemented in Python. In addition, students have to prepare presentations on the 6 different applications. Each student group selects one application and presents the theory and background of the applications to all other students. These presentations are scheduled before the start of the first practical excercise. |
Literatur: |
Weitere Literatur finden Sie in der HdM-Bibliothek. |
Internet: | http://www.hdm-stuttgart.de/~maucher/ |