Commit deae672b authored by Alexander Henkel's avatar Alexander Henkel
Browse files

johannes feedback

parent ea863c11
......@@ -4,7 +4,7 @@ Detecting and monitoring people's activities can be the basis for observing user
One application scenario in healthcare is observing various diseases such as Obsessive-Compulsive Disorder (OCD). For example, detecting hand washing activities can be used to derive the frequency or excessiveness with which people affected by OCD perform this action. Moreover, with automatic detection it is possible to diagnose and even treat such diseases outside a clinical setting~\cite{Ferreri2019Dec, Briffault2018May}. If excessive hand washing is detected, just-in-time interventions can be presented to the user, offering enormous potential for promoting health behavior change~\cite{10.1007/s12160-016-9830-8}.
State-of-the-art Human Activity Recognition methods are supervised deep neural networks derived from concepts like Convolutional Layers or Long short-term memory (LSTM). These require lots of training data to achieve good performance. Since the movement patterns of each human are unique, the performance of activity detection can differ. So training data of a wide variety of humans is necessary to generalize to new users. Therefore, it has been shown that personalized models can achieve better accuracy against user-independent models ~\cite{Hossain2019Jul, Lin2020Mar}.
State-of-the-art Human Activity Recognition methods are often based on supervised deep neural networks. For feature extraction Convolutional Layers and/or Long short-term memory (LSTM) cells are used. However, training these methods requires lots of data to achieve good performance. Since the movement patterns of each human are unique, the performance of activity detection can differ. So training data of a wide variety of humans is necessary to generalize to new users. Therefore, it has been shown that personalized models can achieve better accuracy against user-independent models ~\cite{Hossain2019Jul, Lin2020Mar}.
To personalize a model, retraining on new unseen sensor data is necessary. Obtaining the ground truth labels is crucial for most deep learning techniques. However, the annotation process is time and cost-intensive. Typically, training data is labeled in controlled environments by hand. In a real context scenario, the user would have to take over the major part.
Indeed this requires lots of user interaction and decent expertise, which would contradict the usability.
......@@ -21,9 +21,9 @@ The contributions of my work are as follows:
\begin{itemize}
\item [1.] A personalization approach is implemented, which can be added to an existing HAR application and does not require additional user interaction or changes in the model architecture.
\item [2.] Different indicator-assisted refinement methods, based on Convolutional networks and Fully Connected Autoencoders, are applied to generated labels.
\item [3.] It is demonstrated that a personalized model which results from this approach outperforms the general model and can achieve similar performance as a supervised personalization.
\item [4.] My approach is compared to a common active learning method.
\item [5.] A real-world experiment is presented, which confirms applicability to a broad user base.
\item [3.] It is observed if a personalized model which results from this approach outperforms the general model and can achieve similar performance as a supervised personalization.
%\item [4.] My approach is compared to a common active learning method.
%\item [5.] A real-world experiment is presented, which confirms applicability to a broad user base.
\end{itemize}
......
\chapter{Related Work}\label{chap:relatedwork}
Human Activity Recognition (HAR) is a broad research field used in various applications like healthcare, fitness tracking, elder care, or behavior analysis. Data acquired by different types of sensors like video cameras, range sensors, wearable sensors, or other devices are used to automatically analyze and detect everyday activities. Especially the field of wearable sensors is growing as the technical progress in smart watches makes it possible for a wide range of users to integrate these sensors into their daily lives.
In the following, I give a brief overview of literature about state-of-the-art HAR and how personalization can improve performance. Then, I focus on work that deals with the generation of training data in different approaches. Finally, I show work how cleaning faulty labels in the training data can be done.
In the following, I give a brief overview of literature about state-of-the-art HAR and how personalization can improve performance. Then, I focus on work that deals with the generation of training data in different approaches. Finally, I show related work which can deal with faulty labeled training data.
\section{Activity recognition}\label{sec:relWorkActivityRecognition}
Most Inertial Measurement Units (IMUs) provide a combination of 3-axis acceleration and orientation data in continuous streams. Sliding windows are applied to the streams and are assigned to an activity by the underlying classifying technique ~\cite{s16010115}. This classifier is a prediction function $f(x)$ which returns the predicted activity labels for a given input $x$. Recently, deep neural network techniques have replaced traditional ones such as Support Vector Machines or Random Forests since no hand-crafted features are required ~\cite{ramasamy2018recent}. They use multiple hidden layers of feature decoders and an output layer that provides predicted class distributions ~\cite{MONTAVON20181}. Each layer consists of multiple artificial neurons connected to the following layer's neurons. These connections are assigned a weight that is learned during the training process. First, in the feed-forward pass, the output values are computed based on a batch of training data. In the second stage, called back propagation, the error between the expected and predicted values are computed by a loss function $J$ to get minimized by optimization of the weights. Feed-forward pass and backpropagation are repeated over multiple iterations, called epochs~\cite{Liu2017Apr}.
Most Inertial Measurement Units (IMUs) provide a combination of 3-axis acceleration and orientation data in continuous streams. Sliding windows are applied to the streams and are assigned to an activity by the underlying classifying technique ~\cite{s16010115}. This classifier is a prediction function $f(x)$ which returns the predicted activity labels for a given input $x$. Recently, deep neural network techniques have replaced traditional ones such as Support Vector Machines or Random Forests since no hand-crafted features are required ~\cite{ramasamy2018recent}. They use multiple hidden layers of feature encoding and an output layer that provides predicted class distributions ~\cite{MONTAVON20181}. Each layer consists of multiple artificial neurons connected to the following layer's neurons. These connections are assigned a weight that is learned during the training process. First, in the feed-forward pass, the output values are computed based on a batch sampled from the training data sets. In the second stage, called back propagation, the error between the expected and predicted values are computed by a loss function $J$ to get minimized by optimization of the weights. Feed-forward pass and backpropagation are repeated over multiple iterations, called epochs~\cite{Liu2017Apr}.
The combination of Convolutional Neural Networks (CNN) and Long-short-term memory recurrent neural networks (LSTMs) tend to outperform other approaches. They are considered the current state of the art for human activity recognition~\cite{9043535}. For classification problems, cross-entropy as loss function is used in most works. \extend{???}
......@@ -27,7 +27,7 @@ A typical problem during fine-tuning is catastrophic forgetting ~\cite{Lee2017}.
where $\alpha$ is a constant parameter that adjusts the strength of the penalty, and $\left\Vert \cdot \right\Vert_p$ is the p-norm. Starting point (-SP) is the pre-trained model's parameter vector $\omega^0$. So the penalty adds the distance between current and initial parameters to the loss. Therefore, the loss gets bigger if the distance between the two models grows, which negatively impacts the optimization.
While, these researches show the potential performance gain due to personalization it requires ground truth data which is acquired in manual label processes.
While, these works show the potential performance gain due to personalization it requires ground truth data which is acquired in manual label processes.
Especially deep-learning algorithms require many training data to achieve good performance ~\cite{Perez2017Dec}. Since wearable sensors became available to the general public, obtaining new sensor data for learning has been easy. Nonetheless, this data must be labeled, which is still labor-intensive.
New data can be used in either \textit{supervised}, \textit{semi-supervised} or \textit{unsupervised} learning. In supervised learning, all training instances have to be assigned with their correct, mostly handcrafted labels, whereas in unsupervised learning, the learning process can train with unlabeled instances. A combination of these is semi-supervised learning. Here a small set of labeled instances exists, combined with a larger set of unlabeled instances ~\cite{Chapelle2009Feb}.
......
......@@ -110,7 +110,7 @@ In a corporation with the University of Basel, I evaluated my personalization ap
\input{figures/experiments/table_real_world_datasets}
\input{figures/experiments/table_real_world_general_evaluation}
For each training recording, the base model is used to generate predictions for the pseudo labels. After that, one of the filter configurations \texttt{all\_null\_convlstm3}, \texttt{all\_cnn\_convlstm2\_hard} and \texttt{all\_cnn\_convlstm3\_hard} is applied. The resulting data set is used for training based on the previous model or the model with the best F1 score. As regularization, freezing layers or l2-sp penalty is used. Overall personalizations of a participant, the model with the highest F1 score is determined. \tabref{tab:realWorldEvaluation} shows the resulting best personalization of each participant. Additionally, the last three columns contain the evaluation of the base model after adjusting the kernel settings. The difference between personalization and adjusted base model values gives the true performance increase of the retraining.
For each training recording, the base model generates predictions for the pseudo labels. After that, one of the filter configurations \texttt{all\_null\_convlstm3}, \texttt{all\_cnn\_convlstm2\_hard} and \texttt{all\_cnn\_convlstm3\_hard} is applied. I have compared the \texttt{all\_null\_*} configurations with this setup in before, and they achieved the same results in most cases. Therefore I only consider one of them in the following. The resulting data set is used for training based on the previous model or the model with the best F1 score. As regularization, freezing layers or l2-sp penalty is used. Overall personalizations of a participant, the model with the highest F1 score is determined. \tabref{tab:realWorldEvaluation} shows the resulting best personalization of each participant. Additionally, the last three columns contain the evaluation of the base model after adjusting the kernel settings. The difference between personalization and adjusted base model values gives the true performance increase of the retraining.
Entries with zero iterations, as for participants OCDetect\_12 and OCDetect\_13, state that no better personalization could be found.
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment