Commit 33a0120a authored by burcharr's avatar burcharr 💬
Browse files

automatic writing commit ...

parent cc90d980
......@@ -9,7 +9,17 @@ $endif$
%------------------------------------- Declaration Page ----------------------------
$if(declaration)$
\IfFileExists{declaration_high.pdf}{
\let\origdoublepage\cleardoublepage
\newcommand{\clearemptydoublepage}{%
\clearpage
{\pagestyle{empty}\origdoublepage}%
}
\cleardoublepage
\includepdf[pages=-]{declaration_high.pdf}
\cleardoublepage
}
{
\chapter*{Eidesstattliche Erklärung}
\thispagestyle{empty}
......@@ -27,7 +37,9 @@ $if(declaration)$
Freiburg, den $date$
\makeatother
}
$endif$
%------------------------------------- Declaration Page ----------------------------
......
......@@ -11,7 +11,7 @@ In this section, the results of our hand washing detection system evaluation wil
The results of the theoretical evaluation show, that for each of the defined problems, the neural network based methods can learn to classify the desired different activities with high accuracy. However, there are differences in the difficulty of the problems, and the resulting F1 scores and S scores are not yet perfect, which means that there could be room for improvement.
#### Problem 1
For the problem of classifying hand washing and separating it from all other activities, the raw predictions of the networks without smoothing reached an F1 score of $0.853$ (DeepConvLSTM) and an S score of $0.758$ (DeepConvLSTM-A). The DeepConvLSTM and DeepConvLSTM-A surpass all the other models that we tested, including the baselines RFC, SVM, majority classifier and random classifier. The baselines are surpassed by large margins. This is in line with related work on other human activity recognition tasks, where DeepConvLSTM and DeepConvLSTM with small modifications also achieved the best results. On this specific problem, the CNN model also needs to be mentioned, because its performance was worse, but not far from the DeepConvLSTM-based models. Apart from the neural networks' general superiority, one reason for the worse performance of the classical baselines RFC and SVM could be the imbalanced data set. For the baselines, we did not include class weighting or other means of coping with the imbalance, which could explain some part of the worsened performance compared to the neural network based methods. Nevertheless, it is still reasonable to assume that our methods would still have beat the baselines with a large margin, if we had applied such measures, as the performance difference was significant.
For the problem of classifying hand washing and separating it from all other activities, the raw predictions of the networks without smoothing reached an F1 score of $0.853$ (DeepConvLSTM) and an S score of $0.758$ (DeepConvLSTM-A). The DeepConvLSTM and DeepConvLSTM-A surpass all the other models that we tested, including the baselines RFC, SVM, majority classifier and random classifier. The baselines are surpassed by large margins. This is in line with related work on other human activity recognition tasks, where DeepConvLSTM and DeepConvLSTM with small modifications also achieved the best results. On this specific problem, the CNN model also needs to be mentioned, because its performance was worse, but not far from the DeepConvLSTM-based models. Apart from the neural networks' general superiority, one reason for the worse performance of the classical baselines RFC and SVM could be the imbalanced data set. For the baselines, we did not include class weighting or other means of coping with the imbalance, which could explain some part of the worsened performance compared to the neural network based methods. Nevertheless, it is still reasonable to assume that our methods would still have outperformed the baselines by a large margin, if we had applied such measures, as the performance difference was significant.
The application of smoothing improved the performance of the models even further, to an F1 score of $0.892$ (DeepConvLSTM) and an S score of $0.819$ (DeepConvLSTM-A). This performance boost by smoothing can be explained by the temporal context captured in the data. It is clear, that if many windows in rapid succession are classified as hand washing, it is likely that a small amount of wrong predictions of the Null class appear. The smoothing helps to both filter out false positives and false negatives.
......@@ -35,7 +35,7 @@ As there is no published previous work in the area of automatically detecting co
#### Problem 3
The problem of classifying hand washing and compulsive hand washing separately and distinguishing both from other activities at the same time is arguably harder than the other two problems. Problem 3 can be seen as a combination of problem 1 and problem 2, namely classifying whether and activity is hand washing (problem 1) and, if yes, whether said washing activity is compulsive hand washing (problem 2). By being this 3-class classification problem, problem 3 is thus more difficult and has more room for errors than the other two problems. As a consequence, a lower level of performance must be expected.
The problem of classifying hand washing and compulsive hand washing separately and distinguishing both from other activities at the same time is arguably harder than the other two problems. Problem 3 can be seen as a combination of problem 1 and problem 2, namely classifying whether an activity is hand washing (problem 1) and, if yes, whether said washing activity is compulsive hand washing (problem 2). By being this 3-class classification problem, problem 3 is thus more difficult and has more room for errors than the other two problems. As a consequence, a lower level of performance must be expected.
Out of the models distinctively trained on problem 3, DeepConvLSTM-A performed best with a multiclass F1 Score of $0.692$, a multiclass S score of $0.769$ and a mean diagonal value of the confusion matrix $0.712$. DeepConvLSTM achieved a slightly lower, but nearly as good performance. For this problem, the baseline classic machine learning methods performed much worse, with their multiclass F1 and S scores, as well as their mean diagonal values of the confusion matrix being in the range of around $0.5$.
......@@ -74,18 +74,19 @@ The goal of detecting hand washing and separating it from other activities in re
The separation of hand washing from compulsive hand washing worked extremely well for the theoretical evaluation, which is the only evaluation we were able to test it with. A sensitivity of $99.7\,\%$ was reached with smoothing, while maintaining a specificity of $83.9\,\%$. This means that almost all compulsive hand washing in our test data was detected by the system, although the false positive rate is still a bit higher than we want it to be. Nevertheless, the performance of the model trained for this problem was really strong and fully matched our expectations. We think that a performance on this level in the real-world could possibly really be applied in the treatment of patients with OCD, which is why we consider this goal as reached, too.
#### A real-world evaluation was conducted
The practical evaluation provided us with valuable feedback, showing us strengths and weaknesses of the hand washing detection model. Especially for the previously mentioned washing activities, the evaluation showed us the need of their inclusion as negative training examples. Apart from the false positives, the real-world evaluation confirmed some of our hopes of the system actually being able to detect everyday hand washing with high precision. Although the performance in the real-world test was lower than the one of the theoretical evaluation, it worked well for some of the subjects. The real-world evaluation still yielded a strong performance especially in the task of scenario 2. The estimation of performance for the intense and long washing task (scenario 2) was much closer to the performance reached on our pre-recorded test set, which showed that the system is able to detect more intense hand washing even better. Overall, the real-world evaluation was successful, returning us valuable information about the weak points and strengths of the system so far. In total, the amount of subjects and feedback received was a little bit too small, in order to draw fully qualified conclusions, as the variance of the results was high between the subjects.
The practical evaluation provided us with valuable feedback, showing us strengths and weaknesses of the hand washing detection model. Especially for the previously mentioned washing activities, the evaluation showed us the need of their inclusion as negative training examples. Apart from the false positives, the real-world evaluation confirmed some of our hopes of the system actually being able to detect everyday hand washing with high precision. Although the performance in the real-world test was lower than the one of the theoretical evaluation, it worked well for some of the subjects. The real-world evaluation still yielded a strong performance especially in the task of scenario 2. The estimation of performance for the intense and long washing task (scenario 2) was much closer to the performance reached on our pre-recorded test set, which showed that the system is able to detect more intense hand washing even better. Overall, the real-world evaluation was successful, returning us valuable information about the weak points and strengths of the system so far. In total, the amount of subjects and feedback received was too small, in order to draw fully qualified conclusions, as the variance of the results was high between the subjects.
## Future work
The general performance of our models on problem 2, distinguishing compulsive hand washing from non-compulsive hand washing, was high. The downside is, that this model is only applicable if we know, when the hand washing takes place. However, our results could be employed together with other tools that give us this knowledge about the user currently washing their hands. Examples for this are in development in our group, one of them being a soap dispenser with an integrated proximity sensor. Added to that, Bluetooth beacons stationed near sinks can be used to let the smart watch know that the user is near a specific sink. Conductivity sensors on the users skin could be employed to detect a change of conductivity caused by the contact with tap water. One or more of these methods combined with our model trained for problem 2 could possibly be used to achieve a higher performance for the task of compulsive hand washing detection in the future.
The general performance of our models on problem 2, distinguishing compulsive hand washing from non-compulsive hand washing, was high. The downside is, that this model is only applicable if we know, when the hand washing takes place. However, our results could be employed together with other tools that give us this knowledge about the user currently washing their hands. Examples for this are in development in our group, one of them being a soap dispenser with an integrated proximity sensor. Added to that, Bluetooth beacons stationed near sinks can be used to let the smart watch know that the user is near a specific sink. Conductivity sensors on the users skin could be employed to detect a change of conductivity caused by the contact with tap water. Furthermore, the sound of tap water could possibly also be detected, e.g. by a smart watch with a microphone.
One or more of these methods combined with our model trained for problem 2 could possibly be used to achieve a higher performance for the task of compulsive hand washing detection in the future.
The detection of hand washing could be incorporated into many devices, mainly wrist worn ones, like smart watches. In order to further improve the detection capabilities and accuracy, one would need to invest even more time into carefully designing and training better models. This work's architecture search could be expanded, and more parameter combinations could be tried out. For example, different types of layers, that have not been included in the architecture yet could be tried. Instead of normalizing data on the data set level, batch normalization could be used to try to make the networks faster and more stable.
Different attention mechanisms could be tried out on the hand washing data.
On top of that, all the other hyperparameters could be optimized better. Instead of manual hyperparameter optimization (HPO), more sophisticated versions of HPO could be employed, e.g. bayesian optimization. This could lead to better choices for the batch size, learning rate and other parameters. However, this may take a lot of time to run, as it is computationally expensive.
The current state of the system, especially for the classification of hand washing versus compulsive hand washing, looks promising for future work in this area. The collection of real obsessive-compulsive hand washing data would likely lead to the possible training of models capable of reliably classifying compulsive hand washing. Such models could then be tested on real-world subjects, and evaluated with them. If they perform well enough, they could aid psychologists and their patients with the treatment of compulsive hand washing. Like explained in the introduction, exposure and response prevention (ERP) is a viable treatment method, and interventions from a smart watch could possibly be used for response prevention. The exact design of the interventions and their actual usability forms another exciting problem field and is yet to be researched.
The current state of the system, especially for the classification of hand washing versus compulsive hand washing, looks promising for future work in this area. The collection of real obsessive-compulsive hand washing data would likely lead to the possible training of models capable of reliably classifying compulsive hand washing. Such models could then be tested on real-world subjects, and evaluated with them. If they perform well enough, they could aid psychologists and their patients with the treatment of compulsive hand washing. As explained in the introduction, exposure and response prevention (ERP) is a viable treatment method, and interventions from a smart watch could possibly be used for response prevention. The exact design of the interventions and their actual usability forms another exciting problem field and is yet to be researched.
The hand washing detection should also work well on both wrists. Multiple solutions for the differences occurring between the two sides could be tried. One could separately train two models, each for one of the wrists. The downside of this is, that the system would also need to figure out, on which wrist it is worn, either automatically, or by user input. This leads to some added uncertainty. Another idea would be to just train a model on balanced data from both wrists, leading to a model that can possibly implicitly learn, which wrist the watch is worn on. No matter how we solve this problem, it seems like the watch position on the body must be accounted for in some way, possibly also needing more data, or specific label information about the location of the sensors for the existing data.
......@@ -105,7 +106,7 @@ All in all, a lot of future work could be done in the area of hand washing detec
In this work, we described the development, training and evaluation of a powerful and accurate compulsive and non-compulsive hand washing detection system. The relevance of such a system was explained with its applications in the field of hygiene compliance enforcement (general hand washing), as well as in the field of possibly helping in the treatment of obsessive-compulsive disorder with compulsive hand washing.
We theoretically evaluated different designs of neural networks on three related problems of hand washing detection, including the separation of hand washing from other activities, the separation of hand washing from compulsive hand washing and the separation of hand washing from compulsive hand washing and from other activities at the same time. For this task, we used hand washing data, data of simulated compulsive hand washing, and data of other activities which was collected from publicly available data sets. After training and evaluation, we selected the best functioning system based on several metrics, including the F1 score and the harmonic mean of sensitivity and specificity, which we called S score. The dominating models, DeepConvLSTM and DeepConvLSTM-A were both based on a deep convolutional neural network joined with an LSTM layer. For DeepConvLSTM-A, which performed slightly better than DeepConvLSTM, we added an attention mechanism, in order to allow the model to flexibly focus on more relevant sections of its input. The addition of the attention mechanism lead to a small increase in performance.
The designed models were able to beat baselines such as a random forest classifier and a support vector machine, as well as chance level baselines by a large margin.
The designed models were able to perform better than baselines such as a random forest classifier and a support vector machine, as well as chance level baselines by a large margin.
In a practical evaluation using 5 subjects, we tested DeepConvLSTM-A on the hand washing detection task in a real-world and everyday environment, as well as in a fixed schedule hand washing test. The system ran on a smart watch, which was used to monitor the users wrist movements in real-time and tried to correctly detect hand washing. The sensitivity of this test was lower than expected ($28,33\,\%$), ($50\,\%$ if the correct wrist was used). Furthermore, around 4 false positives per day appeared for different activities, many of which were washing related. They included but were not limited to doing the dishes, brushing one's teeth and scratching oneself. High amounts of false positives could be ruled out in the future, by adding more everyday activities to the training data.
......
......@@ -19,17 +19,17 @@ While it is usually helpful and a basic part of hygiene, hand washing can also b
One method of treatment for clinical cases of OCD is exposure and response prevention (ERP) therapy @meyer_modification_1966 @whittal_treatment_2005. Using this method, patients that suffer from OCD are exposed to situations in which their obsessions are stimulated and they are helped at preventing compulsive reactions to the stimulation. The patients can then "get used" to the situation in a sense, and thus the reaction to the stimulation will be weakened over time. This means that their quality of life is improved, as the severity of their OCD declines.
A successful, i.e. reliable and accurate system for compulsive hand washing detection could be used to intervene, whenever the compulsive hand washing is detected. It could therefore help psychologists and their patients in the treatment of the symptoms. It could help the user to stop the compulsive behavior by issuing a warning. Such a warning could be a vibration of the device, or a sound that is played upon the detection of compulsive behavior. However, the hypothesis of usefulness is yet to be tested, as no such systems exists as of now. Therefore we want to develop a system that can not only detect hand washing with low latency and in real-time, but also discriminate between usual hand washing and obsessive-compulsive hand washing at the same time. The system could then, as described, be used in ERP therapy sessions, but also in everyday life, to prevent compulsive hand washing.
A successful, i.e. reliable and accurate system for compulsive hand washing detection could be used to intervene, whenever the compulsive hand washing is detected. It could therefore help psychologists and their patients in the treatment of the symptoms. It could encourage the user to stop the compulsive behavior by issuing a warning. Such a warning could be a vibration of the device, or a sound that is played upon the detection of compulsive behavior. However, the hypothesis of usefulness is yet to be tested, as no such systems exists as of now. Therefore we want to develop a system that can not only detect hand washing with low latency and in real-time, but also discriminate between usual hand washing and obsessive-compulsive hand washing at the same time. The system could then, as described, be used in ERP therapy sessions, but also in everyday life, to prevent compulsive hand washing.
The separation of compulsive hand washing from ordinary hand washing could be an even harder problem than just hand washing detection itself. It is unclear, whether it is possible to predict the type of hand washing with high probability, as there is no previous work in this area. It is reasonable to assume, that there are strong similarities between compulsive hand washing and non-compulsive hand washing, as well as subtle differences, e.g. in intensity and duration of the washing.
### Wrist worn sensors
Different types of sensors can be used to detect activities such as hand washing. It is possible to detect hand washing from RGB camera data to some extent. However, for this to work, we would need to place a camera at every place and room a subject could want to wash their hands at. This is unfeasible for most applications of hand washing detection and could be very expensive. Furthermore, it might be problematic to place cameras inside wash- or bathrooms for privacy reasons. Thus, a better alternative could be body worn, camera-less devices.
Different types of sensors can be used to detect activities such as hand washing. It is possible to detect hand washing from RGB camera data to some extent. However, for this to work, we would need to place a camera at every place and room a subject would want to wash their hands at. This is unfeasible for most applications of hand washing detection and could be very expensive. Furthermore, it might be problematic to place cameras inside wash- or bathrooms for privacy reasons. Thus, a better alternative could be body worn, camera-less devices.
Inertial measurement units (IMUs) can measure different types of time series movement data, e.g. the acceleration or angular velocity of the device they are embedded in. IMUs are embedded in most modern smart phones and smart watches, which makes them easily available. For hand washing detection, especially the movement of the hands and wrists can contain information that can help us classify the activity. Therefore, we can use a smart watch and its embedded IMU to try to predict whether a user is washing their hands or not. Added to that, if the user is washing their hands, we could try to predict if they are washing them in an obsessive-compulsive way or not. Another advantage of using a smart watch would be, that they usually have in-built vibration motors or even speakers. These means could be used to intervene, whenever compulsive hand washing is detected, as described above. Therefore, wrist worn sensors, especially those embedded in smart watch systems, are used in this work. The wrist worn devices can also be used to execute machine learning models in real-time, using publicly available libraries, e.g. on smart watches running Wear OS.
Inertial measurement units (IMUs) can measure different types of time series movement data, e.g. the acceleration or angular velocity of the device they are embedded in. IMUs are embedded in most modern smart phones and smart watches, which makes them easily available. For hand washing detection, especially the movement of the hands and wrists can contain information that can help us classify the activity. Therefore, we can use a smart watch and its embedded IMU to try to predict whether a user is washing their hands or not. Added to that, if the user is washing their hands, we could try to predict if they are doing so in an obsessive-compulsive way or not. Another advantage of using a smart watch would be, that they usually have in-built vibration motors or even speakers. These means could be used to intervene, whenever compulsive hand washing is detected, as described above. Therefore, wrist worn sensors, especially those embedded in smart watch systems, are used in this work. The wrist worn devices can also be used to execute machine learning models in real-time, using publicly available libraries, e.g. on smart watches running Wear OS.
## Goals
In this work, we want to develop several neural network based machine learning methods for the real-time detection of hand washing and compulsive hand washing on inertial sensor data of wrist worn devices. We also want to test the methods and report meaningful statistics for their performance. Further, we want to test parts of the developed methods in a real-world scenario. We then want to draw conclusions on the applicability of the developed systems in the real-world.
In this work, we want to develop several neural network based machine learning methods for the real-time detection of hand washing and compulsive hand washing on inertial sensor data of wrist worn devices. We also want to test the methods and report meaningful statistics for their performance. Furthermore, we want to test parts of the developed methods in a real-world scenario. We then want to draw conclusions on the applicability of the developed systems in the real-world.
### Detection of hand washing in real-time utilizing inertial measurement sensors
We want to show that neural network based classification methods can be applied to the recognition of hand washing. We want to base our method on sensor data from inertial measurement sensors in smart watches or other wrist worn IMU-equipped devices. We want to detect the hand washing in real-time and directly on the mobile, i.e. on a wrist wearable device, such as a smart watch. Doing so, we would be able to give instant real-time feedback to the user of the device.
......
......@@ -35,5 +35,5 @@ declaration: Hiermit erkläre ich, dass ich diese Arbeit selbstständig verfasst
#abstract
abstract-de: Die automatische Erkennung von Händewaschen und zwanghaftem Händewaschen hat mehrere Anwendungsbereiche in Arbeitsumgebungen und im medizinischen Bereich. Die Erkennung kann zur Überprüfung der Einhaltung von Hygieneregeln eingesetzt werden, da das Händewaschen eine der wichtigsten Komponenten der persönlichen Hygiene ist. Allerdings kann das Waschen auch übertrieben werden, was bedeutet, dass es für die Haut und die allgemeine Gesundheit schädlich sein kann. Manche Patienten mit Zwangsstörungen waschen sich zwanghaft zu häufig und intensiv die Hände auf diese schädliche Weise. Die automatische Erkennung von zwanghaftem Händewaschen kann bei der Behandlung dieser Patienten helfen. Ziel dieser Arbeit ist es, auf neuronalen Netzen basierende Methoden zu entwickeln, die in der Lage sind, Händewaschen und zwanghaftes Händewaschen in Echtzeit auf einem am Handgelenk getragenen Gerät zu erkennen, wobei die Daten der Bewegungssensoren des Geräts verwendet werden. Die entwickelte Methode erreicht eine hohe Genauigkeit für beide Aufgaben. Sie erreicht einen F1 score von 89,2 % für die Erkennung von Händewaschen bzw. 96,6 % für die Erkennung von zwanghaftem Händewaschen. Teile der Arbeit wurden mit Probanden in einem realen Experiment evaluiert, um die starke theoretische Leistung zu bestätigen.
abstract-en: The automatic detection of hand washing and compulsive hand washing has multiple areas of application in work and medical environments. The detection can be used in compliance and hygiene scenarios, as hand washing is one of the main components of personal hygiene. However, the washing can also be overdone, which means it can be unhealthy for the skin and general health. Patients with obsessive-compulsive disorder sometimes compulsively wash their hands in such a harmful way. In order to help with their treatment, the automatic detection of compulsive hand washing can possibly be applied. This thesis aims to develop neural network based methods which are able to detect hand washing as well as compulsive hand washing in real time on a wrist-worn device using inertial motion sensor data of the device. We achieve high accuracy for both tasks. We reach an F1 score of 89.2 % for hand washing detection and 96.6 % for compulsive hand washing detection. We evaluate parts of the work on subjects in a real world experiment, in order to confirm the strong theoretical performance achieved.
abstract-en: The automatic detection of hand washing and compulsive hand washing has multiple areas of application in work and medical environments. The detection can be used in compliance and hygiene scenarios, as hand washing is one of the main components of personal hygiene. However, the washing can also be overdone, which means it can be unhealthy for the skin and in general. Patients with obsessive-compulsive disorder sometimes compulsively wash their hands in such a harmful way. In order to help with their treatment, the automatic detection of compulsive hand washing can possibly be applied. This thesis aims to develop neural network based methods which are able to detect hand washing as well as compulsive hand washing in real time on a wrist-worn device using inertial motion sensor data of the device. We achieve high accuracy for both tasks. We reach an F1 score of 89.2 % for hand washing detection and 96.6 % for compulsive hand washing detection. We evaluate parts of the work on subjects in a real world experiment, in order to confirm the strong theoretical performance achieved.
---
......@@ -229,7 +229,7 @@ Dropout is a method that helps to prevent neural networks from overfitting to th
We applied dropout to all model classes in preliminary testing. On the validation data, dropout with $p=0.25$ was tested for all models. Out of all the models, only the fully connected network had an increased validation performance, whilst the other models did not. For this reason, dropout was only applied in the fully connected model.
#### Early stopping
We used early stopping, based on the split-off validation set. Early stopping is a regularization technique @prechelt_early_1998, which is frequently employed during the training of neural networks. It helps to prevent overfitting to the training set, by stopping the training process early. In order to decide at which point in the process, i.e. after which epoch, the training should be stopped, we monitor the loss function on the validation set. The model is trained utilizing the training set, but as soon as the validation loss starts to rise, we can stop the training. This makes sense because we can assume that the increase of the validation loss is reflecting the unknown trend of the loss on the test set, which we cannot look at during train time. An example of this process can be seen in fig. \ref{fig:learning_curves}, which shows the comparison between training and validation losses over the progress of the training.
We used early stopping, based on the split-off validation set. Early stopping is a regularization technique @prechelt_early_1998, which is frequently employed during the training of neural networks. It helps to prevent overfitting to the training set, by stopping the training process early. In order to decide at which point in the process, i.e. after which epoch, the training should be stopped, we monitor the loss function on the validation set. The model is trained utilizing the training set, but as soon as the validation loss starts to rise, we can stop the training. This makes sense because we can assume that the increase of the validation loss is reflecting the unknown trend of the loss on the test set, which we can not access during train time. An example of this process can be seen in fig. \ref{fig:learning_curves}, which shows the comparison between training and validation losses over the progress of the training.
\begin{figure}[!h]
\centering
......@@ -250,6 +250,8 @@ The application programming was done by Alexander Henkel and is not part of this
In order to run a pre-trained neural network based model smart watches, we used TensorFlow Lite (tflite). The models were trained as explained above with PyTorch and then converted to tflite using ONNX and the TensorFlow Lite converter included in TensorFlow. However, conversion from ONNX to TensorFlow is not supported for all operations needed in a neural network. Thus, compatibility for ONNX runtime (.ort) models on the smart watch was also added, and we converted our models into this format.
\newpage %%% TODO
![Flow diagram of the smart watch classification loop. A notification is only sent, if the notification cooldown is 0.](img/wear_data_flow.pdf){#fig:watch_flow width=98%}
The course of action on the smart watch is shown in fig. \ref{fig:watch_flow}. The watch continuously records the data from the integrated IMU, to fill a buffer. To filter out the most basic idle case of "no movement", the neural network will only be run to classify the current activity, if at least one sensor value is higher than a certain threshold, $v_{idle}$ that is fixed inside the application. If there is enough movement to reach the threshold, a forward pass of the neural network model is done with the data from the last few seconds. It is possible to set the interval classified in each network pass to a value from $1$ second to $10$ seconds, but the model must be trained for the specific interval length, as mentioned above. Our windows had a length of 3 seconds (150 samples of the 6 sensor axes). The forward pass will then output class probabilities for each of the windows considered.
......@@ -308,7 +310,7 @@ S\ score &= 2 \cdot \frac{Sensitivity \cdot Specificity}{Sensitivity + Specifici
\label{s_score}
The sensitivity is the ratio of positive samples that get correctly recognized. The specificity is the ratio of negatives that get correctly recognized. If both these measures are close to 1, the model performs well. The precision is the ratio of true positives contained in all positive predictions. It is similar to the sensitivity but also punishes false positives to some extent. The recall is the same as the sensitivity. The harmonic mean of recall and precision is called F1 score and is also commonly used to evaluate binary prediction tasks. Since we especially need to balance specificity and sensitivity for our task, we also report the S score, which we define as the harmonic mean of specificity and sensitivity. One of the reasons for reporting the S score is the lack of false positive punishment in the F1 score formula. The F1 score does not punish false positives as much as needed in the task of compulsive hand washing detection. While it is partly included in the precision measure, if there are many positives in the ground truth, then the precision won't weigh false positives enough. Including the specificity in the measure therefore makes sure we do not lose track of the false positives, which would be annoying to the user, especially if we send out smart watch notifications with vibration or sound alerts.
For the multiclass problem of distinguishing compulsive hand washing from normal hand washing from other activities, the binary metrics are not applicable. Here, we report normalized confusion matrices, and their mean diagonal values as one performance measure. The confusion matrix shows, which amount of samples belonging to a certain class (true labels, rows of the matrix) are predicted to belong to which other class (predicted labels, columns of the matrix). The normalized version of the confusion matrix replaces the total values by ratios in proportion to the number of true labels for each class. This means that for each true label row in the matrix, the values sum to 1.
For the multiclass problem of distinguishing compulsive hand washing from normal hand washing and from other activities, the binary metrics are not applicable. Here, we report normalized confusion matrices, and their mean diagonal values as one performance measure. The confusion matrix shows, which amount of samples belonging to a certain class (true labels, rows of the matrix) are predicted to belong to which other class (predicted labels, columns of the matrix). The normalized version of the confusion matrix replaces the total values by ratios in proportion to the number of true labels for each class. This means that for each true label row in the matrix, the values sum to 1.
The mean diagonal value of this matrix can be seen as a mean class accuracy score, as the diagonal values of the normalized confusion matrix are the accuracy values for each possible class.
Added to that, we report an adapted F1 score, identical to the one used by Zeng et al. @zeng_understanding_2018. The adapted multiclass F1 score is calculated by taking the mean over all classes $\mathbf{C}$, of the F1 scores if we treat the class $\mathbf{C}_i, i \in [0,1,2]$ as the positive class, and the remaining classes as the negative class:
......@@ -325,7 +327,7 @@ S\ score\ multi = \frac{1}{3}\cdot \sum_{i=0}^2 S\ score(\mathbf{C}_i)
We also report the metrics used for problems 1 on a binarized version of the third problem. To binarize the problem, we define "hand washing" as the positive class, and the remainder as negative class. Note that "hand washing" includes "compulsive hand washing". With this binarization, we can compare the models trained on the multiclass problem to the models trained on the initial binary problem. However, as problem 1 is a special case of problem 3, we expect the performance of the models trained for problem 3 to be lower than the ones trained for problem 1.
\label{chained_model}
Added to that, we also report the performance of the best two models for problem 1 and problem 2 chained and then tested on problem 3. This means we execute the best model for hand washing detection first, and then, for all sample windows that were detected as hand washing, we run the best model for the classification of compulsive hand washing vs non compulsive hand washing. From this chain, we can derive three-class predictions by counting all samples that were not detected by the first model as negatives (Null) and the ones predicted to be hand washing, but not predicted to be compulsive by the second model as hand washing (HW). The remaining samples are then classified to be compulsive hand washing (HW-C) by the chained model. This chained model could possibly perform better, as it is the combination of two different models, which thus, have had more training time and possibly have a higher capacity. However, the method of chaining two models would also take up more space in the memory and more computation time on the device, and thus be less efficient.
Added to that, we also report the performance of the best two models for problem 1 and problem 2 chained and then tested on problem 3. This means we execute the best model for hand washing detection first, and then, for all sample windows that were detected as hand washing, we run the best model for the classification of compulsive hand washing versus non compulsive hand washing. From this chain, we can derive three-class predictions by counting all samples that were not detected by the first model as negatives (Null) and the ones predicted to be hand washing, but not predicted to be compulsive by the second model as hand washing (HW). The remaining samples are then classified to be compulsive hand washing (HW-C) by the chained model. This chained model could possibly perform better, as it is the combination of two different models, which thus, have had more training time and possibly have a higher capacity. However, the method of chaining two models would also take up more space in the memory and more computation time on the device, and thus be less efficient.
### Practical evaluation
......
......@@ -69,7 +69,7 @@ These gates are fully connected neural network layers (marked in orange and with
###### DeepConvLSTM
is a network proposed by Ordonez et al. @ordonez_deep_2016 and consists of four convolutional layers as well as two LSTM layers. It reaches state-of-the-art performance and is used for general human activity recognition tasks. The combination of convolutional layers and LSTMs works well with time series data, as it can use the advantages of both convolutional layers and the intelligent "memory" provided by the LSTMs.
Bock et al. @bock_improving_2021 employ an altered version of DeepConvLSTM @ordonez_deep_2016. Bock et al. propose reducing the number of LSTM layers to one, resulting in the architecture shown in @fig:deepConvLSTM. They evaluate their approach on 5 different publicly available data sets and report an increased performance on four out of the five. Leaving out one LSTM layer drastically reduces the amount of parameters to be learned as well as the time needed to train the network.
Bock et al. @bock_improving_2021 employ an altered version of DeepConvLSTM @ordonez_deep_2016. They propose reducing the number of LSTM layers to one, resulting in the architecture shown in @fig:deepConvLSTM. They evaluate their approach on 5 different publicly available data sets and report an increased performance on four out of the five. Leaving out one LSTM layer drastically reduces the amount of parameters to be learned as well as the time needed to train the network.
![DeepConvLSTM and the altered version, by Marius Bock @bock_improving_2021](img/deepConvBock.png){#fig:deepConvLSTM width=98%}
......@@ -103,7 +103,7 @@ To our knowledge, no study has ever tried to separately predict compulsive hand
Most studies that try to automatically detect hand washing are aiming for compliance improvements, i.e. trying to increase or measure the frequency of hand washes or assessing or improving the quality of hand washes.
Hand washing compliance can be measured using different tools. Jain et al. @jain_low-cost_2009 use an RFID-based system to check whether health care workers comply with hand washing frequency requirements. However, the system is merely used to make sure all workers entering an emergency care unit have washed their hands. Bakshi et al. @bakshi_feature_2021 developed a hand washing detection data set with RGB video data and showed a valid way to extract SIFT-descriptors from it for further research. Llorca et al. showed a vision-based system for automatic hand washing quality assessment @llorca_vision-based_2011 based on the detection of skin in RGB images using optical techniques such as optical flow estimation.
A study by Li et al. @li_wristwash_2018 is able to recognize 13 steps of a hand washing procedure on wrist motion data with an accuracy of $85\,\%$. They employ a sliding window feature-based hidden markov model approach and run a continuous recognition. Wang et al. explore using sensor armbands to assess the users compliance with given hand washing hygiene guidelines @wang_accurate_2020. They run a classifier using XGBoost and are mostly able to separate the different steps of the scripted hand washing routine.
A study by Li et al. @li_wristwash_2018 is able to recognize 13 steps of a hand washing procedure on wrist motion data with an accuracy of $85\,\%$. They employ a sliding window feature-based hidden markov model approach and run a continuous recognition. Wang et al. explore using sensor wristbands to assess the users compliance with given hand washing hygiene guidelines @wang_accurate_2020. They run a classifier using XGBoost and are mostly able to separate the different steps of the scripted hand washing routine.
Added to that, Cao et al. @cao_awash_2021 developed a system that similarly detects different steps of a scripted hand washing routine and prompts the user, if they confuse the order of the steps or forget one of the steps. The technology is aimed at elderly patients with dementia. Their system is able to detect which step of hand washing is currently conducted based on wrist motion data using an LSTM-based neural network. However, none of the three systems mentioned in this paragraph are meant to separate hand washing from other activities. These models are trained to tell apart the different steps of hand washing, as they are defined in their respective studies. The models used in these studies are not tested on a null class, i.e. they are not tested for other activities than hand washing. Thus, they can only be used for the detection of steps of hand washing, but not for the detection of hand washing in real life.
In order to separate hand washing from other activities, Mondol et al. employ a simple feed forward neural network. Their network consists of a few linear layers and can be used to detect hand washing @sayeed_mondol_hawad_2020. Their method seeks to specifically eliminate false positives by trying to detect out of distribution (OOD) samples, i.e. samples that are very different from the ones seen by the model during training. They apply a conditional Gaussian distribution of the network's features of the last layer before the output layer (penultimate layer).
......
......@@ -40,7 +40,7 @@ Like for problem 1, applying normalization to the input data worsens the perform
The results for task 2 with the application of smoothing are shown in table \ref{tbl:only_conv_hw_rm} and @fig:p2_metrics. Similarly to problem 1, smoothing helps to further increase the performance of all classifiers. All neural network based methods reach F1 scores of over $0.95$. The best F1 score is achieved with DeepConvLSTM-A ($0.966$), the second best with LSTM ($0.965$). The differences remain small for this problem, as DeepConvLSTM ($0.963$) and LSTM-A ($0.961$) also achieve very similar scores. There is a small gap, after which the RFC ($0.922$) and SVM ($0.914$) follow. The traditional methods do not profit as much from the smoothing as the neural network based methods.
The S scores of the neural network based models are also high, with the highest score being $0.911$ (DeepConvLSTM-A), followed by $0.910$ (LSTM), $0.909$ (DeepConvLSTM) and $0.908$ (LSTM-A). The values of CNN ($0.897$) and FC ($0.893$) are not far off either. However, the classical methods RFC ($0.761$) and SVM ($0.724$) do not reach the same level of performance, with the S score gap to the neural network based models even becoming a little bit bigger after the application of smoothing.
The S scores of the neural network based models are also high, with the highest score being $0.911$ (DeepConvLSTM-A), followed by $0.910$ (LSTM), $0.909$ (DeepConvLSTM) and $0.908$ (LSTM-A). The values of CNN ($0.897$) and FC ($0.893$) are not far off either. However, the classical methods RFC ($0.761$) and SVM ($0.724$) do not reach the same level of performance, with the S score gap to the neural network based models even becoming slightly bigger after the application of smoothing.
\input{tables/only_conv_hw_rm.tex}
......
......@@ -8,7 +8,7 @@ References are automatically generated from the BibTex file (references.bib)
# References {.unnumbered}
\markboth{Literature}{Literature}
\markboth{References}{References}
::: {#refs}
:::
......
No preview for this file type
......@@ -8,6 +8,7 @@
\usepackage{booktabs}
\usepackage{multirow}
\usepackage[section]{placeins}
\usepackage{tikz}
% hyphenation
\tolerance=9999
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment