Commit a8327294 authored by burcharr's avatar burcharr 💬
Browse files

automatic writing commit ...

parent dcc61f7c
This diff is collapsed.
......@@ -29,7 +29,7 @@ Different types of sensors can be used to detect activities such as hand washing
Inertial measurement units (IMUs) can measure different types of time series movement data, e.g. the acceleration or angular velocity of the device they are embedded in. IMUs are embedded in most modern smart phones and smart watches, which makes them easily available. For hand washing detection, especially the movement of the hands and wrists can contain information that can help us classify hand washing. Therefore, we can use a smart watch and its embedded IMU to try to predict whether a user is washing their hands or not. Added to that, if the user is washing their hands, we could try to predict if they are washing them in an obsessive-compulsive way or not. Another advantage of using a smart watch would be, that they usually have in-built vibration motors or even speakers. These means could be used to intervene, whenever compulsive hand washing is detected, as described above. Therefore, wrist worn sensors, especially those embedded in smart watch systems, are used in this work. The wrist worn devices can also be used to execute machine learning models in real time, using publicly available libraries, e.g. on smart watches running Wear OS.
## Goals
In this work, we want to develop a method for the real time detection of hand washing and compulsive hand washing. We also want to test the method and report meaningful statistics of its success. Further, we want to test parts of the developed method in a real world scenario. We then want to draw conclusions on the applicability of the developed systems in the real world.
In this work, we want to develop several neural network based machine learning methods for the real time detection of hand washing and compulsive hand washing on inertial sensor data of wrist worn devices. We also want to test the methods and report meaningful statistics for their performance. Further, we want to test parts of the developed methods in a real world scenario. We then want to draw conclusions on the applicability of the developed systems in the real world.
### Detection of hand washing in real time utilizing inertial measurement sensors
We want to show that neural network based classification methods can be applied to the recognition of hand washing. We want to base our method on sensor data from inertial measurement sensors in smart watches or other wrist worn IMU-equipped devices. We want to detect the hand washing in real time and directly on the mobile, i.e. on a wrist wearable device, such as a smart watch. Doing so, we would be able to give instant real time feedback to the user of the device.
......@@ -38,8 +38,4 @@ We want to show that neural network based classification methods can be applied
On top of the detection of hand washing, the detection of obsessive-compulsive hand washing is part of our goals. We want to be able to separate compulsive hand washing from non compulsive hand washing, based on inertial motion data. Especially for the scenario of possible interventions used for the treatment of OCD, this separation is crucial, as OCD patients do also wash their hands in non compulsive ways and we do not want to intervene for these kinds of hand washing procedures.
### Real world evaluation
We want to evaluate the most promising of the developed models in a real world evaluation, in order to obtain a realistic estimate of its applicability in the task of hand washing detection. We want to report results of an evaluation with multiple subjects to obtain a meaningful performance estimation. From this estimation we want to draw conclusions on the applicability of the developed system in real world therapy scenarios. Added to that, we want to derive future improvements, that could be applied to the system.
TODO merge if needed or remove:
In this thesis we aim to develop several neural network based machine learning methods that can be used to detect hand washing and compulsive hand washing on inertial sensor data of wrist worn devices. We evaluate different approaches for multiple scenarios of hand washing classification. We examine the real world applicability of the developed approach with multiple users.
\ No newline at end of file
We want to evaluate the most promising of the developed models in a real world evaluation, in order to obtain a realistic estimate of its applicability in the task of hand washing detection. We want to report results of an evaluation with multiple subjects to obtain a meaningful performance estimation. From this estimation we want to draw conclusions on the applicability of the developed system in real world therapy scenarios. Added to that, we want to derive future improvements, that could be applied to the system.
\ No newline at end of file
......@@ -17,7 +17,7 @@ studentnumber: "4133000"
date: "08.11.2021"
reviewer1: "Dr. Phillip M. Scholl"
reviewer2: "TBD"
reviewer2: "Prof. Dr. Thomas Brox"
## optional
......@@ -35,5 +35,5 @@ declaration: Hiermit erkläre ich, dass ich diese Arbeit selbstständig verfasst
#abstract
abstract-de: Die automatische Erkennung von Händewaschen und zwanghaftem Händewaschen hat mehrere Anwendungsbereiche in Arbeits- und medizinischen Umgebungen. Die Erkennung kann zur Überprüfung der Einhaltung von Hygieneregeln eingesetzt werden, da das Händewaschen eine der wichtigsten Komponenten der persönlichen Hygiene ist. Allerdings kann das Waschen auch übertrieben werden, was bedeutet, dass es für die Haut und die allgemeine Gesundheit schädlich sein kann. Manche Patienten mit Zwangsstörungen waschen sich zwanghaft und zu häufig die Hände auf diese schädliche Weise. Die automatische Erkennung von zwanghaftem Händewaschen kann bei der Behandlung dieser Patienten helfen. Ziel dieser Arbeit ist es, auf neuronalen Netzen basierende Methoden zu entwickeln, die in der Lage sind, Händewaschen und zwanghaftes Händewaschen in Echtzeit auf einem am Handgelenk getragenen Gerät zu erkennen, wobei die Daten der Bewegungssensoren des am Handgelenk getragenen Geräts verwendet werden. Die entwickelte Methode erreicht eine hohe Genauigkeit für beide Aufgaben und Teile der Arbeit wurden mit Probanden in einem realen Experiment evaluiert, um die starke theoretische Leistung (F1 score von 89,2 % bzw. 96,6 %) zu bestätigen.
abstract-en: The automatic detection of hand washing and compulsive hand washing has multiple areas of application in work and medical environments. The detection can be used in compliance and hygiene scenarios, as hand washing is one of the main components of personal hygiene. However, the washing can also be overdone, which means it can be unhealthy for the skin and general health. Patients with obsessive-compulsive disorder sometimes compulsively wash their hands in such a harmful way. In order to help with their treatment, the automatic detection of compulsive hand washing can possibly be applied. This thesis aims to develop neural network based methods which are able to detect hand washing as well as compulsive hand washing in real time on a wrist worn device using intertial motion sensor data of said wrist worn device. We achieve high accuracy for both tasks and evaluate parts of the work on subjects in a real world experiment, in order to confirm the strong theoretical performance (F1 score of 89.2 % and 96.6 %) achieved.
abstract-en: The automatic detection of hand washing and compulsive hand washing has multiple areas of application in work and medical environments. The detection can be used in compliance and hygiene scenarios, as hand washing is one of the main components of personal hygiene. However, the washing can also be overdone, which means it can be unhealthy for the skin and general health. Patients with obsessive-compulsive disorder sometimes compulsively wash their hands in such a harmful way. In order to help with their treatment, the automatic detection of compulsive hand washing can possibly be applied. This thesis aims to develop neural network based methods which are able to detect hand washing as well as compulsive hand washing in real time on a wrist worn device using inertial motion sensor data of said wrist worn device. We achieve high accuracy for both tasks and evaluate parts of the work on subjects in a real world experiment, in order to confirm the strong theoretical performance (F1 score of 89.2 % and 96.6 %) achieved.
---
......@@ -9,7 +9,7 @@ Added to that, we further explain the development and testing of different neura
Then we explain meaningful methods of evaluating the developed models and methods on both unseen pre-recorded data and with real world subjects.
## Data set
In order to be able to train any machine learning algorithm, we need enough data that can be used to correctly train the used model. In our case of wrist motion data, we used acceleration and gyroscope time series data from multiple sources which will be explained below. The needed inertial data of each sensor is given as $\mathbf{s}_i \in \mathbb{R}^{d_i \times t}$, where $d_i$ is the dimensionality of the sensor (e.g. $d_{accelerometer} = 3$) and $t$ is the amount of samples in a time series. We use accelerometer and gyroscope data witch both have 3 dimensions. We would have liked to use more available sensors, like the magnetometer available in many modern IMUs, but most external data sets do only include accelerometer and gyroscope data. We combine the two sensors we use into one data series of dimensionality $\mathbb{R}^{6 \times t}$. An example for the sensor data used in our experiments is shown in fig. \ref{fig:sensor_data}
In order to be able to train any machine learning algorithm, we need enough data that can be used to correctly train the used model. In our case of wrist motion data, we used acceleration and gyroscope time series data from multiple sources which will be explained below. The needed inertial data of each sensor is given as $\mathbf{s}_i \in \mathbb{R}^{d_i \times t}$, where $d_i$ is the dimensionality of the sensor (e.g. $d_{accelerometer} = 3$) and $t$ is the amount of samples in a time series. We use accelerometer and gyroscope data which both have 3 dimensions. We would have liked to use more available sensors, like the magnetometer available in many modern IMUs, but most external data sets do only include accelerometer and gyroscope data. We combine the two sensors we use into one data series of dimensionality $\mathbb{R}^{6 \times t}$. An example for the sensor data used in our experiments is shown in fig. \ref{fig:sensor_data}
\begin{figure}[hp]
\centering
......@@ -132,7 +132,7 @@ For each of the problems, we train the baselines with the same windows. For SVM
The implementations of SVM and RFC in scikit-learn @pedregosa_scikit-learn_nodate are used. SVM and RFC are trained with the standard parameters in scikit-learn.
To incorporate the "chance level" we use majority prediction and uniform random prediction. TODO maybe more?
To incorporate the "chance level" we use uniform random prediction and majority prediction. The majority prediction is able to achieve high levels of accuracy for heavily imbalanced data sets, as it always predicts the most frequent class. The uniform random prediction represents the performance level of completely uninformed guessing. We hope to outperform all these baselines with our methods.
## Neural network based detection of hand washing
As explained in Section \ref{section:har}, neural networks are the state-of-the-art when it comes to human activity recognition. For hand washing detection, this can also be applied and thus, our classification algorithms are all entirely based on neural networks.
......@@ -241,7 +241,7 @@ We used early stopping, based on the split off validation set. Early stopping is
\label{fig:learning_curves}
\end{figure}
We can see that the training loss will still decrease while the validation loss is already rising again. Therefore we employ early stopping, to be able to select the model parameters which lead to an empirically minimal validation loss. The losses achieved by parameter updates using mini batches are not decreasing monotonically. Due to the visible "zig-zagging" of the losses, it makes sense to continue running the training for a fixed amount of epochs, even if the validation loss is already rising. This is due to the possibility, that the validation loss could potentially decrease again, below the current minimal validation loss, in a future epoch. As training a model can take a lot of time, we need to select a value in a trade-off between continuing to run the training in order to get to a potential decrease of the validation loss again, and stopping the training in order to not waste training time. We fixed the amount of epochs to keep running to 50, and the maximum amount of epochs a training could run to 300. As can be seen partially in fig. \ref{fig:learning_curves}, the training was usually stopped much earlier than at the 300 epoch mark. The stopping positions heavily depend on the model classes and how fast each class can be trained, and reached from after around 20 epochs to after more than 100 epochs.
We can see that the training loss will still decrease while the validation loss is already rising again. Therefore we employ early stopping, to be able to select the model parameters which lead to an empirically minimal validation loss. The losses achieved by parameter updates using mini batches are not decreasing in every single step. Due to the visible "zig-zagging" of the losses, it makes sense to continue running the training for a fixed amount of epochs, even if the validation loss is already rising. This is due to the possibility, that the validation loss could potentially decrease again, below the current minimal validation loss, in a future epoch. As training a model can take a lot of time, we need to select a value in a trade-off between continuing to run the training in order to get to a potential decrease of the validation loss again, and stopping the training in order to not waste training time. We fixed the amount of epochs to keep running to 50, and the maximum amount of epochs a training could run to 300. As can be seen partially in fig. \ref{fig:learning_curves}, the training was usually stopped much earlier than at the 300 epoch mark. The stopping positions heavily depend on the model classes and how fast each class can be trained, and reached from after around 20 epochs to after more than 100 epochs.
In fig. \ref{fig:learning_curves}, the stop position of the early stopping is marked with a bold \textbf{x}, and it marks the epoch with the lowest validation loss. After the training is stopped due to the early stopping, we reset the models parameters to the parameters with the lowest validation loss. As can be seen in the figure, the training loss still decreases after this point, but the validation loss does not.
......@@ -251,7 +251,7 @@ Like for recording our data set, we use Android based smart watches running a cu
The application programming was done by Alexander Henkel and is not part of this work. We only designed the outline for the deep learning model deployment part of the app, by providing the needed pre-trained models in the appropriate formats, so that they can be executed on mobile devices.
In order to run a pre-trained neural network based model smart watches, we used TensorFlow Lite (tflite). The models were trained as explained above with PyTorch and then converted to tflite using ONNX and the TensorFlow Lite converter included in TensorFlow. However, conversion from ONNX to TensorFlow is not supported for all operations needed in a neural network. Thus, we also added compatibility for onnx runtime (ort) models on the smart watch.
In order to run a pre-trained neural network based model smart watches, we used TensorFlow Lite (tflite). The models were trained as explained above with PyTorch and then converted to tflite using ONNX and the TensorFlow Lite converter included in TensorFlow. However, conversion from ONNX to TensorFlow is not supported for all operations needed in a neural network. Thus, compatibility for ONNX runtime (.ort) models on the smart watch was also added, and we converted our models into this format.
![Flow diagram of the smart watch classification loop. A notification is only sent, if the notification cooldown is 0.](img/wear_data_flow.pdf){#fig:watch_flow width=98%}
......
......@@ -3,9 +3,6 @@
\label{sec:results}
This chapter will report the evaluation results from both the theoretical evaluation and the practical evaluation.
TODO: FLOAT BARRIERS formatting towards the end
## Theoretical Evaluation
For the theoretical evaluation, we report the results separately, split by the tasks 1.-3. described in Section \ref{sec:classification_problems}
......@@ -14,21 +11,18 @@ In all tables of this chapter, the best values for a specific metric will be hig
The values for the metrics specificity and sensitivity will be reported in the tables, but not discussed separately, because they are included in the more meaningful metrics F1 score and S score. The results generally show that achieving a high value in only one metric out of specificity and sensitivity, at cost of reaching low values in the other one, brings about worse performance in the F1 score and S score.
### Distinguishing hand washing from all other activities
For the first task of classifying hand washing in contrast to non hand washing activities, we report the results with and without the application of label smoothing. The results without label smoothing are shown in table \ref{tbl:washing}. In @fig:p1_metrics, the results scores for problem 1 with and without smoothing are shown.
For the first task of classifying hand washing in contrast to non hand washing activities, we report the results with and without the application of label smoothing. The results without label smoothing are shown in table \ref{tbl:washing}. In @fig:p1_metrics, the resulting scores for problem 1 with and without smoothing are shown.
As we can see, without label smoothing, the neural networks outperformed the conventional machine learning methods by a large margin. The best neural network method outperforms the best traditional method by a difference of nearly $0.2$ for the F1 score and by around $0.1$ for the S score. Between the neural network methods themselves, the differences can become really small, especially between the top performing DeepConvLSTM and DeepConvLSTM-A. While DeepConvLSTM reaches a slightly better F1 score of $0.853$, DeepConvLSTM-A reaches $0.847$. However, if we take into consideration the S score, DeepConvLSTM-A ($0.758$) is ahead of DeepConvLSTM ($0.756$). The convolutional neural network (CNN, $0.750$) and the LSTM with attention mechanism (LSTM-A, $0.708$) also reach similar levels of performance on both metrics, with the CNN outperforming the LSTM-A only in the S score. We can see that, like in the preliminary validation, normalization did not lead to the desired performance advantage. For the neural network methods, activating the normalization leads to a decrease of $0.01$ to $0.1$ in the F1 score and of $0.07$ to $0.15$ in the S score.
\input{tables/washing.tex}
With label smoothing, we can reach an increased performance with all of the model classes, including the traditional machine learning methods RFC and SVM. The results with a 20 prediction wide average filter smoothing can be seen in table \ref{tbl:washing_rm} and @fig:p1_metrics. The top performing neural network architectures do not change with the smoothing. However, the performance measures increase. DeepConvLSTM has the best F1 score ($0.892$), followed by LSTM-A ($0.891$), DeepConvLSTM-A ($0.890$) and CNN ($0.888$). These results are higher by about $0.03$ to $0.05$ compared to utilizing the raw predictions, without smoothing. In the S score metric, DeepConvLSTM-A performs best ($0.819$), followed by DeepConvLSTM-A ($0.814$) and CNN ($0.808$). For the S score, the advantage of the label smoothing is bigger in general, between $0.05$ to $0.06$ for all model classes except the LSTM, which only improves by $0.015$. RFC and SVM do not improve with the label smoothing, their scores decrease by about $0.04$ for both of the metrics.
![F1 score and S score for problem 1](img/washing_all.pdf){#fig:p1_metrics width=105%}
As we can see, without label smoothing, the neural networks outperformed the conventional machine learning methods by a large margin. The best neural network method outperforms the best traditional method by a difference of nearly $0.2$ for the F1 score and by around $0.1$ for the S score. Between the neural network methods themselves, the differences can become really small, especially between the top performing DeepConvLSTM and DeepConvLSTM-A. While DeepConvLSTM reaches a slightly better F1 score of $0.853$, DeepConvLSTM-A reaches $0.847$. However, if we take into consideration the S score, DeepConvLSTM-A ($0.758$) is ahead of DeepConvLSTM ($0.756$). The convolutional neural network (CNN, $0.750$) and the LSTM with attention mechanism (LSTM-A, $0.708$) also reach similar levels of performance on both metrics, with the CNN outperforming the LSTM-A only in the S score. We can see that, like in the preliminary validation, normalization did not lead to the desired performance advantage. For the neural network methods, activating the normalization leads to a decrease of $0.01$ to $0.1$ in the F1 score and of $0.07$ to $0.15$ in the S score.
\input{tables/washing_rm.tex}
With label smoothing, we can reach an increased performance with all of the model classes, including the traditional machine learning methods RFC and SVM. The results with a 20 prediction wide average filter smoothing can be seen in table \ref{tbl:washing_rm} and @fig:p1_metrics. The top performing neural network architectures do not change with the smoothing. However, the performance measures increase. DeepConvLSTM has the best F1 score ($0.892$), followed by LSTM-A ($0.891$), DeepConvLSTM-A ($0.890$) and CNN ($0.888$). These results are higher by about $0.03$ to $0.05$ compared to utilizing the raw predictions, without smoothing. In the S score metric, DeepConvLSTM-A performs best ($0.819$), followed by DeepConvLSTM-A ($0.814$) and CNN ($0.808$). For the S score, the advantage of the label smoothing is bigger in general, between $0.05$ to $0.06$ for all model classes except the LSTM, which only improves by $0.015$. RFC and SVM do not improve with the label smoothing, their scores decrease by about $0.04$ for both of the metrics.
The models running on normalized data also profit from the label smoothing, however they still cannot reach the performance of the non normalized models.
For the special case of the models initially trained on problem 3 which were then binarized and run on problem 1 (without smoothing), we only report some results in this section. The full results can be found in the appendix. Surprisingly, the models trained on problem 3 reach similar F1 scores on the test data of problem 1 as the models trained on problem 1. DeepConvLSTM achieves an F1 score of $0.857$, DeepConvLSTM-A achieves $0.847$. The F1 score of DeepConvLSTM is even higher than the highest F1 score of the models trained for problem 1 by $0.004$. However, for the S score metric, the models trained for problem 3 can only reach up to $0.704$ (CNN) or $0.671$ (DeepConvLSTM-A), which is lower by $0.052$ than the best performing model trained for problem 1.
......@@ -42,7 +36,7 @@ The results without smoothing of predictions for the second task, distinguishing
\input{tables/only_conv_hw.tex}
Like for problem 1, applying normalization to the input data worsens the performance of almost all classifiers. The performance loss in the F1 score reaches from $0.024$ (LSTM) to $0.11$ (CNN). For the FC network, the normalization leads to a slight performance increase of $0.01$. The S score performance decrease when we apply normalization is between $0.27$ (CNN) and $0.128$ (DeepConvLSTM-A). As with the F1 scores, the FC network profits off the normalization, here by a difference in S score of $0.035$. SVM and RFC also do not perform better with the application of normalization.
Like for problem 1, applying normalization to the input data worsens the performance of almost all classifiers. The performance loss in the F1 score reaches from $0.024$ (LSTM) to $0.11$ (CNN). For the FC network, the normalization leads to a slight performance increase of $0.01$. The S score performance decrease when we apply normalization is between $0.27$ (CNN) and $0.128$ (DeepConvLSTM-A). As with the F1 scores, the FC network profits off the normalization, here by a difference in \mbox{S score} of $0.035$. SVM and RFC also do not perform better with the application of normalization.
The results for task 2 with the application of smoothing are shown in table \ref{tbl:only_conv_hw_rm} and @fig:p2_metrics. Similarly to problem 1, smoothing helps to further increase the performance of all classifiers. All neural network based methods reach F1 scores of over $0.95$. The best F1 score is achieved with DeepConvLSTM-A ($0.966$), the second best with LSTM ($0.965$). The differences remain small for this problem, as DeepConvLSTM ($0.963$) and LSTM-A ($0.961$) also achieve very similar scores. There is a small gap, after which the RFC ($0.922$) and SVM ($0.914$) follow. The traditional methods do not profit as much from the smoothing as the neural network based methods.
......@@ -62,7 +56,7 @@ The three class problem of classifying hand washing, compulsive hand washing and
![Confusion matrices for all neural network based classifiers with and without normalization of the sensor data](img/confusion.pdf){#fig:confusion width=98%}
The confusion matrices of the non normalized models in the right column do not allow us directly to decide on one "best" model, but we can see, that the diagonal values of all the LSTM-based models seem to be higher than the ones of FC and CNN. The "pure" LSTM model performs best on the compulsive hand washing class (HW-C, $0.88$), and close to best on the hand washing class (HW, $0.78$) only closely beaten by DeepConvLSTM ($0.79$). However, LSTM only reaches an accuracy of $0.33$ on the Null class. The best performing model on the Null class is the CNN model ($0.64$), which in turn only reaches $0.51$ on HW and $0.72$ on HW-C. While DeepConvLSTM-A never reaches the highest value in any of the specific classes, its total performance in the confusion matrix is good. It reaches higher values in the Null class than the other LSTM based models ($0.53$ vs $0.47$ (LSTM-A), $0.46$ (DeepConvLSTM), $0.33$ (LSTM)). At the same time, its performance on the HW and HW-C classes is similar to the one of DeepConvLSTM, albeit a little bit lower ($0.78$ vs $0.79$ on HW and $0.82$ vs $0.85$ on HW-C).
The confusion matrices of the non normalized models in the right column do not allow us directly to decide on one "best" model, but we can see, that the diagonal values of all the LSTM-based models seem to be higher than the ones of FC and CNN. The "pure" LSTM model performs best on the compulsive hand washing class (HW-C, $0.88$), and close to best on the hand washing class (HW, $0.78$) being only closely beaten by DeepConvLSTM ($0.79$). However, LSTM only reaches an accuracy of $0.33$ on the Null class. The best performing model on the Null class is the CNN model ($0.64$), which in turn only reaches $0.51$ on HW and $0.72$ on HW-C. While DeepConvLSTM-A never reaches the highest value in any of the specific classes, its total performance in the confusion matrix is good. It reaches higher values in the Null class than the other LSTM based models ($0.53$ vs $0.47$ (LSTM-A), $0.46$ (DeepConvLSTM), $0.33$ (LSTM)). At the same time, its performance on the HW and HW-C classes is similar to the one of DeepConvLSTM, albeit a little bit lower ($0.78$ vs $0.79$ on HW and $0.82$ vs $0.85$ on HW-C).
As for problem 1 and for problem 2, we obtain the result, that normalization seems to decrease the performance of all the neural network based classifiers. For this problem, the FC network also has a decreased performance when normalized input data is used.
......@@ -98,8 +92,8 @@ The mean diagonal value of the confusion matrix upholds almost the same ordering
## Practical Evaluation
### Scenario 1: One day of evaluation
In the first scenario, the 5 (TODO) subjects reported an average of $4.75$ hand washing procedures on the day on which they evaluated the system.
Per subject, there were $4.75$ ($\pm 3.3$) hand washing procedures. Out of those, $1.75$ ($\pm 2.06\,\%$) were correctly identified. The accuracy per subject was $28,33\,\%$ ($\pm 37.9\,\%$). The highest accuracy for a subject was $80\,\%$ out of 5 hand washes, the lowest was $0\,\%$ out of 4 hand washes. Of all hand washing procedures conducted over the day by the subjects, $35,8\,\%$ were detected correctly.
In the first scenario, the 5 subjects reported an average of $4.75$ hand washing procedures on the day on which they evaluated the system.
Per subject, there were $4.75$ ($\pm\,3.3$) hand washing procedures. Out of those, $1.75$ ($\pm\,2.06\,\%$) were correctly identified. The accuracy per subject was $28.33\,\%$ ($\pm\,37.9\,\%$). The highest accuracy for a subject was $80\,\%$ out of 5 hand washes, the lowest was $0\,\%$ out of 4 hand washes. Of all hand washing procedures conducted over the day by the subjects, $35.8\,\%$ were detected correctly.
Some subjects wore the smart watch on the right wrist instead of the left wrist, and reported worse results for that. Leaving out hand washes conducted with the smart watch worn on the right wrist, the detection sensitivity rises to $50\,\%$.
......@@ -108,7 +102,7 @@ The correlation of duration of the hand washing with the detection rate is $-0.0
Added to that, the correlation of the intensity of washing with the detection rate is $0.267$.
For the reported false positives, the subjects experiences varied. The subjects reported $4$ ($\pm 5.19$) false hand washing detections on this day. The minimum was 0 false positives, and the highest was 13 false positives.
For the reported false positives, the subjects' experiences varied. The subjects reported $4$ ($\pm\,5.19$) false hand washing detections on this day. The minimum was 0 false positives, and the highest was 13 false positives.
The activities leading to false positives include:
......@@ -123,7 +117,7 @@ The full list of reported activities for which false positives occurred can be f
Some subjects also reported difficulties with the smart watch application (not part of this work), which lead to the model not being run at all sometimes, which might also have influenced the results. It could be possible, that for some hand washing procedures, the smart watch application was not executed, which would lead the user to note down a false negative, also decreasing the sensitivity in the results.
### Scenario 2: Controlled intensive hand washing
In scenario 2, the subjects each washed their hands at least 3 times. Some subjects voluntarily agreed to perform more repetitions, which leads to more than 3 washing detection results per subject. The detection accuracy per subject was $76\,\%$ ($\pm 25\,\%$), with the highest being, $100\,\%$ and the lowest being $50\,\%$.
In scenario 2, the subjects each washed their hands at least 3 times. Some subjects voluntarily agreed to perform more repetitions, which leads to more than 3 washing detection results per subject. The detection accuracy per subject was $76\,\%$ ($\pm\,25\,\%$), with the highest being $100\,\%$ and the lowest being $50\,\%$.
The mean accuracy over all repetitions and not split by subjects was $73,7\,\%$. For scenario 2, one user moved the smart watch from the right wrist to the left wrist after two repetitions. The first two repetitions were not detected, while the two repetitions with the smart watch worn on the right wrist were detected correctly. Leaving out hand washes conducted with the smart watch worn on the right wrist, the detection sensitivity rises to $78.6\,\%$, and the detection accuracy per subject is $82.5\,\%$ ($\pm 23.6\,\%$).
The mean accuracy over all repetitions and not split by subjects was $73.7\,\%$. For scenario 2, one user moved the smart watch from the right wrist to the left wrist after two repetitions. The first two repetitions were not detected, while the two repetitions with the smart watch worn on the right wrist were detected correctly. Leaving out hand washes conducted with the smart watch worn on the right wrist, the detection sensitivity rises to $78.6\,\%$, and the detection accuracy per subject is $82.5\,\%$ ($\pm\,23.6\,\%$).
\begin{table}
\centering
\caption{Problem 2: metrics of the different classes}
\caption{Problem 2: metrics of the different classes without smoothing}
\label{tbl:only_conv_hw}
\begin{tabular}{|l|l|c|c|c|c|}
\toprule
......
\begin{table}
\centering
\caption{Problem 1: metrics of the different classes}
\caption{Problem 1: metrics of the different classes without smoothing}
\label{tbl:washing}
\begin{tabular}{|l|l|c|c|c|c|}
\toprule
......
No preview for this file type
%------------------------------------- Includes / usepackage -------------------------
\usepackage{caption}
\usepackage{subfig}
\usepackage{float}
......@@ -9,9 +10,9 @@
\usepackage[section]{placeins}
% Hurenkinder und Schusterjungen verhindern
\clubpenalty10000
\widowpenalty10000
\displaywidowpenalty=10000
\clubpenalties=4 10000 10000 10000 0
\widowpenalties=4 10000 10000 10000 0
\displaywidowpenalty=9999
%------------------------------------- Custom Title Page ---------------------------
\renewcommand{\maketitle}{
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment