Wearable sensors like smartwatches offer a good opportunity for human activity recognition (HAR). They are available to a wide user base and can be used in everyday life. Due to the variety of users, the detection model must be able to recognize different movement patterns. Recent research has demonstrated, that a personalized recognition tends to perform better than a general one. However labeled data from the user is required which can be time consuming and labor intensive. While most personalization approaches try to reduce the necessary labeled training data, the labeling process remains dependent on some user interaction.
Wearable sensors like smartwatches offer a good opportunity for human activity recognition (HAR). They are available to a wide user base and can be used in everyday life. Due to the variety of users, the detection model must be able to recognize different movement patterns. Recent research has demonstrated, that a personalized recognition tends to perform better than a general one. However, additional labeled data from the user is required which can be time consuming and labor intensive. While common personalization approaches try to reduce the necessary labeled training data, the labeling process remains dependent on some user interaction.
In this work I present a personalization approach, where training data labels are derived from inexplicit user feedback which is acquired during the usage of a HAR application. The general model is used to predict labels which are then tried to be corrected by various denoising filters based on Convolutional Neural Networks and Autoencoders. High confidence data is then used for fine tuning the recognition model using transfer learning. This process is assisted by the previously obtained user feedback.
In this work, I present a personalization approach in which training data labels are derived from inexplicit user feedback obtained during the usual use of a HAR application. The general model predicts labels which are then refined by various denoising filters based on Convolutional Neural Networks and Autoencoders. This process is assisted by the previously obtained user feedback. High confidence data is then used for fine tuning the recognition model via transfer learning. No changes to the model architecture are required and thus personalization can easily be added to an existing application.
Analysis in the context of hand wash detection demonstrate, that a performance increase of xxx\%can be achieved. More over I compare my approach with a traditional personalization method to confirm the robustness. Finally I evaluate the process in a real world experiment.
Analysis in the context of hand wash detection demonstrate, that a significant performance increase can be achieved. More over I compare my approach with a traditional personalization method to confirm the robustness. Finally I evaluate the process in a real world experiment where participants wore a smart watch on a daily basis for a month.
@@ -101,10 +101,10 @@ In the first step we can use the raw information of the indicators. As said duri
\subsubsection{Naive}
In a naive approach we search for the largest bunch of neighbored pseudo-labels which have a high value for hand washing. This is done by computing a score over all subsets of adjacent labels. The score of a subset $Sub_k=\{\hat{\bm{y}}_p, \dots, \hat{\bm{y}}_q\}$ is computed by:
The score just benefits from adding a label to the set if the predicted value for hand washing is greater than the \textit{null} value. Additionally there is a general $0.1$penalty for adding a label, so the prediction has to be at least a bit certain that the sample is probably hand washing. The subset with the highest score is assumed to be the hand wash action and all containing pseudo labels are set to $\hat{\bm{y}}_i =\begin{bmatrix}0&1\end{bmatrix}$. All other labels are set to $\hat{\bm{y}}_j =\begin{bmatrix}1&0\end{bmatrix}$. This approach depends heavily on the performance of the base model. Incorrect predictions can completely shift the assumed subset to wrong samples. It may also happen, that \textit{null} predictions between correct \textit{hw} predictions split the possible subset which results in just a partial coverage of the original activity. In \figref{fig:examplePseudoFilterScore} you can see a plot of two example intervals where the pseudo labels are refined by this approach.
The score just benefits from adding a label to the set if the predicted value for hand washing is greater than the \textit{null} value. Additionally there is a general penalty $\delta$for adding a label, so the prediction has to be at least a bit certain that the sample is probably hand washing. The subset with the highest score is assumed to be the hand wash action and all containing pseudo labels are set to $\hat{\bm{y}}_i =\begin{bmatrix}0&1\end{bmatrix}$. All other labels are set to $\hat{\bm{y}}_j =\begin{bmatrix}1&0\end{bmatrix}$. This approach depends heavily on the performance of the base model. Incorrect predictions can completely shift the assumed subset to wrong samples. It may also happen, that \textit{null} predictions between correct \textit{hw} predictions split the possible subset which results in just a partial coverage of the original activity. In \figref{fig:examplePseudoFilterScore} you can see a plot of two example intervals where the pseudo labels are refined by this approach.
In the next step I use convLSTM denoisng autoencoders in different configurations. As with the FCN-dAE I just use the values for $pred_{hw}$ and compute the missing ones afterwards. Since LSTMs are specified for sequential data, the single interval has to be converted into a time-series sequence. Therefore I apply a sliding window of width 32 and shift 1. This creates 96 successive sections of the interval with 32 values. The autoencoder does a sequence-to-sequence prediction on the 96x1x32 dimensional input, so the output is also a 96x1x32 dimensional sequence of time-series. To recreate the 128x1 dimensional interval I compute the mean of predictions for a sample over the sequence.
In the next step I use convLSTM denoisng autoencoders in different configurations. As with the FCN-dAE I just use the values for $pred_{hw}$ and compute the missing ones afterwards. Since LSTMs are specified for sequential data, the single interval has to be converted into a time-series sequence. Therefore I apply a sliding window of width 32 and shift 1. This creates 96 consecutive sections of the interval, each with 32 values. The autoencoder does a sequence-to-sequence prediction on the 96x1x32 dimensional input, so the output is also a 96x1x32 dimensional sequence of time-series. To recreate the 128x1 dimensional interval I compute the mean of predictions for a sample over the sequence.
I implemented three different network architectures which I call convLSTM1-dAE, convLSTM2-dAE and convLSTM3-dAE. The used convLSTM layers are bidirectional so they can be used in the encoder and decoder part. All networks use ELU activation functions and sigmoid for the output just like the FCN-dAE architecture. As well mean squared error is used to compute the loss between the predicted sequence and the clean ground truth sequence.
In the next step I want to observe if smoothing could have a negative effect if correct labels are smoothed. Therefore I repeat the previous experiment but don't flip the randomly selected labels and just apply the smoothing $s$ to them. Again, no major changes in the performance due to noise in \textit{hw} labels is expected which can also be seen in the left graph of \figref{fig:supervisedFalseSoftNoise}. In the case of wrongly smoothed \textit{null} labels we can see a negative trend in S score for higher smoothing values, as shown in the right graph. For a greater portion of smoothed labels, the smooth value has higher influence to the models performance. But for noise values $\leq0.2\%$ the all personalized models still achieve higher S scores than the general models. Therefore it seems, that the personalization benefits from using soft labels.
In the next step I want to observe if smoothing could have a negative effect if correct labels are smoothed. Therefore I repeat the previous experiment but don't flip the randomly selected labels and just apply the smoothing $s$ to them. Again, no major changes in the performance due to noise in \textit{hw} labels is expected which can also be seen in the left graph of \figref{fig:supervisedFalseSoftNoise}. In the case of wrongly smoothed \textit{null} labels we can see a negative trend in S score for higher smoothing values, as shown in the right graph. For a greater portion of smoothed labels, the smooth value has higher influence to the models performance. But for noise values $\leq0.2\%$ the all personalized models still achieve higher S scores than the general models. Therefore it seems, that the personalization benefits from using soft labels. To make sure that the performance increase of smoothing false labels prevails the drawbacks of falsely smoothed correct labels, I combined both experiments. First a certain ratio $n$ of \textit{null} labels are flipped and then these labels are smoothed to value $s$ after that the same ratio $n$ of other \textit{null} labels are smoothed to value $s$. The resulting performance of personalizations can be seen in \figref{arg1}.
@@ -41,7 +41,7 @@ In the next step I want to observe if smoothing could have a negative effect if
\section{Evaluation of different Pseudo label generations}
In this section, I describe the evaluation of different pseudo labeling approaches using the filters introduced in \secref{sec:approachFilterConfigurations}. For each filter configuration, the base model is used to predict the labels of the training sets and create pseudo labels. After that the filter is applied to the pseudo labels. To determine the quality of the pseudo labels, they are evaluated against the ground truth values using soft versions of the metrics $Sensitivity^{soft}$, $Specificity^{soft}$, $F_1^{soft}$, $S^{soft}$. The general model is then trained by the refined pseudo labels. All resulted models are evaluated by the test sets and compared.
In this section, I describe the evaluation of different pseudo labeling approaches using the filters introduced in \secref{sec:approachFilterConfigurations}. For each filter configuration, the base model is used to predict the labels of the training sets and create pseudo labels. After that the filter is applied to the pseudo labels. To determine the quality of the pseudo labels, they are evaluated against the ground truth values using soft versions of the metrics $Sensitivity^{soft}$, $Specificity^{soft}$, $F_1^{soft}$, $S^{soft}$. The general model is then trained by the refined pseudo labels. All resulted models are evaluated by their test sets and the mean over all is computed.
\caption[Supervised soft noise on hw]{\textbf{Supervised soft noise on hw.} Multiple plots of F1 score (left) and S score (right) for personalized models trained on different noise values in \textit{hw} labels with increasing smoothing of false labels.}
\caption[Supervised soft noise on hw]{\textbf{Supervised soft noise on hw.} Multiple plots of S score for personalized models trained on different noise values in \textit{hw} labels with increasing smoothing of false labels.}