Commit ec3e7fdc authored by burcharr's avatar burcharr 💬
Browse files

automatic writing commit ...

parent 7dd3bcb2
...@@ -15,7 +15,7 @@ For the problem of classifying hand washing and separating it from all other act ...@@ -15,7 +15,7 @@ For the problem of classifying hand washing and separating it from all other act
The application of smoothing improved the performance of the models even further, to an F1 score of $0.892$ (DeepConvLSTM) and an S score of $0.819$ (DeepConvLSTM-A). This performance boost by smoothing can be explained by the temporal context captured in the data. It is clear, that if many windows in rapid succession are classified as hand washing, it is likely that a small amount of wrong predictions of the Null class appear. The smoothing helps to both filter out false positives and false negatives. The application of smoothing improved the performance of the models even further, to an F1 score of $0.892$ (DeepConvLSTM) and an S score of $0.819$ (DeepConvLSTM-A). This performance boost by smoothing can be explained by the temporal context captured in the data. It is clear, that if many windows in rapid succession are classified as hand washing, it is likely that a small amount of wrong predictions of the Null class appear. The smoothing helps to both filter out false positives and false negatives.
Normalization was shown to be ineffective for our approach, worsening the performance of almost all models. This could be due to the difference in distribution in the train and test set. The parameters for normalization were estimated from the train set and applied to the test set, which can always be inaccurate, because we assume that train and test set have the same distributions. This was not the case here, which is probably why the normalized data was harder to learn and test on, than the non-normalized data. Normalization was shown to be ineffective for our approach, worsening the performance of almost all models. This could be due to the difference in distribution in the train and test set. The parameters for normalization were estimated from the train set and applied to the test set, which can always be inaccurate, because we assume that train and test set have the same distributions. This was not the case here, which is probably why the normalized data was harder to learn and test on, than the non-normalized data.
For the reasons explained in section \ref{s_score}, we weigh the results of the S score higher than the ones of the F1 score. Thus, the best network for problem 1 is DeepConvLSTM-A, although only by a slight margin. The overall achieved S score of $0.819$ is based on reaching a specificity of $0.751$ and a sensitivity of $0.90$, which means that $90\,\%$ of windows containing hand washing were classified as hand washing correctly. However, $75.1\,\%$ of windows that contained no hand washing were classified as Null, which leaves some room for improvement, because this means that the model still has a false positive rate of $24.9\,\%$, which is more than desired. For the reasons explained in section \ref{s_score}, we weigh the results of the S score higher than the ones of the F1 score. Thus, the best network for problem 1 is DeepConvLSTM-A, although only by a slight margin. The overall achieved S score of $0.819$ is based on reaching a specificity of $0.751$ and a sensitivity of $0.90$, which means that $90\,\%$ of windows containing hand washing were classified as hand washing correctly. However, $75.1\,\%$ of windows that contained no hand washing were classified as Null, which leaves some room for improvement, because this means that the model still has a false positive rate of $24.9\,\%$, which is more than desired.
...@@ -68,7 +68,7 @@ In total, the practical evaluation showed some weaknesses and some strengths of ...@@ -68,7 +68,7 @@ In total, the practical evaluation showed some weaknesses and some strengths of
## Comparison of goals to results ## Comparison of goals to results
#### The detection of hand washing in real time from inertial motion sensors is feasible #### The detection of hand washing in real time from inertial motion sensors is feasible
The goal of detecting hand washing and separating it from other activities in real time was reached by employing the trained DeepConvLSTM-A network which achieved good performance in our theoretical evaluation. The detection is not perfect yet, especially the separation from other activities seems to still have some weaknesses, especially when washing activities other than hand washing are included. The system also missed out on too many of the hand washing procedures executed in the real-world evaluation. However, the system was able to detect and correctly identify hand washing very well in the theoretical evaluation, and in many of the cases in the practical evaluation, which is why we consider our goal mostly reached. The goal of detecting hand washing and separating it from other activities in real time was reached by employing the trained DeepConvLSTM-A network which achieved good performance in our theoretical evaluation. The detection is not perfect yet, especially the separation from other activities seems to still have some weaknesses, e.g. when washing activities other than hand washing are included. The system also missed out on too many of the hand washing procedures executed in the real-world evaluation. However, the system was able to detect and correctly identify hand washing very well in the theoretical evaluation, and in many of the cases in the practical evaluation, which is why we consider our goal mostly reached.
#### Hand washing and compulsive hand washing can be separated #### Hand washing and compulsive hand washing can be separated
The separation of hand washing from compulsive hand washing worked extremely well for the theoretical evaluation, which is the only evaluation we were able to test it with. A sensitivity of $99.7\,\%$ was reached with smoothing, while maintaining a specificity of $83.9\,\%$. This means that almost all compulsive hand washing in our test data was detected by the system, although the false positive rate is still a bit higher than we want it to be. Nevertheless, the performance of the model trained for this problem was really strong and fully matched our expectations. We think that a performance on this level in the real-world could possibly really be applied in the treatment of patients with OCD, which is why we consider this goal as reached, too. The separation of hand washing from compulsive hand washing worked extremely well for the theoretical evaluation, which is the only evaluation we were able to test it with. A sensitivity of $99.7\,\%$ was reached with smoothing, while maintaining a specificity of $83.9\,\%$. This means that almost all compulsive hand washing in our test data was detected by the system, although the false positive rate is still a bit higher than we want it to be. Nevertheless, the performance of the model trained for this problem was really strong and fully matched our expectations. We think that a performance on this level in the real-world could possibly really be applied in the treatment of patients with OCD, which is why we consider this goal as reached, too.
...@@ -81,10 +81,10 @@ The practical evaluation provided us with valuable feedback, showing us strength ...@@ -81,10 +81,10 @@ The practical evaluation provided us with valuable feedback, showing us strength
The general performance of our models on problem 2, distinguishing compulsive hand washing from non-compulsive hand washing, was high. The downside is, that this model is only applicable if we know, when the hand washing takes place. However, our results could be employed together with other tools that give us this knowledge about the user currently washing their hands. Examples for this are in development in our group, one of them being a soap dispenser with an integrated proximity sensor. Added to that, Bluetooth beacons stationed near sinks can be used to let the smart watch know that the user is near a specific sink. Conductivity sensors on the users skin could be employed to detect a change of conductivity caused by the contact with tap water. Furthermore, the sound of tap water could possibly also be detected, e.g. by a smart watch with a microphone. The general performance of our models on problem 2, distinguishing compulsive hand washing from non-compulsive hand washing, was high. The downside is, that this model is only applicable if we know, when the hand washing takes place. However, our results could be employed together with other tools that give us this knowledge about the user currently washing their hands. Examples for this are in development in our group, one of them being a soap dispenser with an integrated proximity sensor. Added to that, Bluetooth beacons stationed near sinks can be used to let the smart watch know that the user is near a specific sink. Conductivity sensors on the users skin could be employed to detect a change of conductivity caused by the contact with tap water. Furthermore, the sound of tap water could possibly also be detected, e.g. by a smart watch with a microphone.
One or more of these methods combined with our model trained for problem 2 could possibly be used to achieve a higher performance for the task of compulsive hand washing detection in the future. One or more of these methods combined with our model trained for problem 2 could possibly be used to achieve a higher performance for the task of compulsive hand washing detection in the future.
The detection of hand washing could be incorporated into many devices, mainly wrist worn ones, like smart watches. In order to further improve the detection capabilities and accuracy, one would need to invest even more time into carefully designing and training better models. This work's architecture search could be expanded, and more parameter combinations could be tried out. For example, different types of layers, that have not been included in the architecture yet could be tried. Instead of normalizing data on the data set level, batch normalization could be used to try to make the networks faster and more stable. The detection of hand washing could be incorporated into many devices, mainly wrist worn ones, like smart watches. In order to further improve the detection capabilities and accuracy, one would need to invest even more time into carefully designing and training better models. This work's architecture search could be expanded, and more parameter combinations could be tried out. For example, different types of layers, that have not been included in the architecture yet could be tried. Instead of normalizing data on the data set level, batch normalization could be used to try to make the networks more stable.
Different attention mechanisms could be tried out on the hand washing data. Different attention mechanisms could be tried out on the hand washing data.
On top of that, all the other hyperparameters could be optimized better. Instead of manual hyperparameter optimization (HPO), more sophisticated versions of HPO could be employed, e.g. bayesian optimization. This could lead to better choices for the batch size, learning rate and other parameters. However, this may take a lot of time to run, as it is computationally expensive. On top of that, all the other hyperparameters could be optimized better. Instead of manual hyperparameter optimization (HPO), more sophisticated versions of HPO could be employed, e.g. Bayesian optimization. This could lead to better choices for the batch size, learning rate and other parameters. However, this may take a lot of time to run, as it is computationally expensive.
The current state of the system, especially for the classification of hand washing versus compulsive hand washing, looks promising for future work in this area. The collection of real obsessive-compulsive hand washing data would likely lead to the possible training of models capable of reliably classifying compulsive hand washing. Such models could then be tested on real-world subjects, and evaluated with them. If they perform well enough, they could aid psychologists and their patients with the treatment of compulsive hand washing. As explained in the introduction, exposure and response prevention (ERP) is a viable treatment method, and interventions from a smart watch could possibly be used for response prevention. The exact design of the interventions and their actual usability forms another exciting problem field and is yet to be researched. The current state of the system, especially for the classification of hand washing versus compulsive hand washing, looks promising for future work in this area. The collection of real obsessive-compulsive hand washing data would likely lead to the possible training of models capable of reliably classifying compulsive hand washing. Such models could then be tested on real-world subjects, and evaluated with them. If they perform well enough, they could aid psychologists and their patients with the treatment of compulsive hand washing. As explained in the introduction, exposure and response prevention (ERP) is a viable treatment method, and interventions from a smart watch could possibly be used for response prevention. The exact design of the interventions and their actual usability forms another exciting problem field and is yet to be researched.
...@@ -110,8 +110,10 @@ The designed models were able to perform better than baselines such as a random ...@@ -110,8 +110,10 @@ The designed models were able to perform better than baselines such as a random
In a practical evaluation using 5 subjects, we tested DeepConvLSTM-A on the hand washing detection task in a real-world and everyday environment, as well as in a fixed schedule hand washing test. The system ran on a smart watch, which was used to monitor the users wrist movements in real-time and tried to correctly detect hand washing. The sensitivity of this test was lower than expected ($28,33\,\%$), ($50\,\%$ if the correct wrist was used). Furthermore, around 4 false positives per day appeared for different activities, many of which were washing related. They included but were not limited to doing the dishes, brushing one's teeth and scratching oneself. High amounts of false positives could be ruled out in the future, by adding more everyday activities to the training data. In a practical evaluation using 5 subjects, we tested DeepConvLSTM-A on the hand washing detection task in a real-world and everyday environment, as well as in a fixed schedule hand washing test. The system ran on a smart watch, which was used to monitor the users wrist movements in real-time and tried to correctly detect hand washing. The sensitivity of this test was lower than expected ($28,33\,\%$), ($50\,\%$ if the correct wrist was used). Furthermore, around 4 false positives per day appeared for different activities, many of which were washing related. They included but were not limited to doing the dishes, brushing one's teeth and scratching oneself. High amounts of false positives could be ruled out in the future, by adding more everyday activities to the training data.
In the second test of the practical evaluation, subjects performed intensive and long hand washing repetitions, which were closer to our lab recorded washing data (including the simulated compulsive data) and thus easier to detect. The system's performance here was much closer to the results of the theoretical evaluation of our models sensitivity ($76\,\%$ vs $90\,\%$ and $82,5\,\%$ if the correct wrist was used). In the second test of the practical evaluation, subjects performed intensive and long hand washing repetitions, which were closer to our lab recorded washing data (including the simulated compulsive data) and thus easier to detect. The system's performance here was much closer to the results of the theoretical evaluation of our models sensitivity ($76\,\%$ vs $90\,\%$ and $82,5\,\%$ if the left wrist was used).
Hence, the evaluation results suggest that the developed system is able to properly detect hand washing in many cases. The theoretical specificity ($75.1\,\%$) and sensitivity ($90\,\%$) of the system is high, but the practical application shows some room for improvement. Hence, the evaluation results suggest that the developed system is able to properly detect hand washing in many cases. The theoretical specificity ($75.1\,\%$) and sensitivity ($90\,\%$) of the system is high, but the practical application shows some room for improvement.
Simulated obsessive-compulsive hand washing could be separated from ordinary hand washing with a sensitivity of $99.7\,\%$ while a specificity of $83.9\,\%$ was achieved. Therefore, the application of our system to real patients is an exciting topic for future work.
In conclusion, the application of wrist worn sensor data to the detection of hand washing and compulsive hand washing remains an interesting and open field of research, with many possible areas of application. Especially the detection of compulsive hand washing in real time would be a world's first and seems promising for future usage in the treatment of OCD patients. Due to the possibility of directly running neural network models on wrist worn smart watches, interventions could be generated in real time and with a latency below 15 seconds. In conclusion, the application of wrist worn sensor data to the detection of hand washing and compulsive hand washing remains an interesting and open field of research, with many possible areas of application. Especially the detection of compulsive hand washing in real time would be a world's first and seems promising for future usage in the treatment of OCD patients. Due to the possibility of directly running neural network models on wrist worn smart watches, interventions could be generated in real time and with a latency below 15 seconds.
...@@ -204,7 +204,7 @@ In order to find the best batch size, sizes between 32 samples per batch and 102 ...@@ -204,7 +204,7 @@ In order to find the best batch size, sizes between 32 samples per batch and 102
There is a connection between the batch size and the learning rate. Increasing the batch size can have a similar effect as reducing the learning rate over time (learning rate decay) @smith_dont_2018. Since we use a comparatively big batch size for our model training, we experimented with smaller learning rate values. During preliminary testing on the validation set, different initial values from 0.01 to 0.00001 were tested. We fixed the initial learning rate to 0.0001, as this provided the best performance. We had implemented starting with a higher learning rate and then using learning rate decay but found out during preliminary testing on the validation set, that this approach did not improve the performance, in our case. We also found out that starting with higher learning rates ($lr > 0.01$) lead to numerical instability in the recurrent networks, producing NaN values for gradients and thus parameters. This means the training became unstable for the networks containing LSTM layers, hence the learning rate had to be reduced for these networks anyways. There is a connection between the batch size and the learning rate. Increasing the batch size can have a similar effect as reducing the learning rate over time (learning rate decay) @smith_dont_2018. Since we use a comparatively big batch size for our model training, we experimented with smaller learning rate values. During preliminary testing on the validation set, different initial values from 0.01 to 0.00001 were tested. We fixed the initial learning rate to 0.0001, as this provided the best performance. We had implemented starting with a higher learning rate and then using learning rate decay but found out during preliminary testing on the validation set, that this approach did not improve the performance, in our case. We also found out that starting with higher learning rates ($lr > 0.01$) lead to numerical instability in the recurrent networks, producing NaN values for gradients and thus parameters. This means the training became unstable for the networks containing LSTM layers, hence the learning rate had to be reduced for these networks anyways.
##### Loss function ##### Loss function
As loss function, we use the cross-entropy loss, weighted by the classes' frequencies ($\mathcal{L}_{weighted}$). This means that the loss function corrects for imbalanced classes, and we do not have to rely on sub sampling or repetition in order to balance the class frequencies in the data set. The weighted cross entropy loss is defined as shown in equation \ref{eqn:cross_entropy_loss}. The true labels $\mathbf{y}$ and the models' output $\mathbf{x}$ are one-hot encoded vectors. We first apply the "softmax" function to the models' output (see equations \ref{eqn:softmax} and \ref{eqn:apply_softmax}). Then, the loss is calculated by applying the weighted cross-entropy loss function, with the weight of each class being the inverse of its relative frequency in the training set data. This way, the predictions for all classes have the same potential influence on the parameter updates, despite the classes not being perfectly balanced. As loss function, we use the cross-entropy loss, weighted by the classes' frequencies ($\mathcal{L}_{weighted}$). This means that the loss function corrects for imbalanced classes, and we do not have to rely on sub sampling or repetition in order to balance the class frequencies in the data set. The weighted cross entropy loss is defined as shown in equation \ref{eqn:cross_entropy_loss}. The true labels $\mathbf{y}$ and the models' output $\mathbf{x}$ are one-hot encoded vectors. We first apply the "softmax" function to the models' output $\mathbf{x}$ (see equations \ref{eqn:softmax} and \ref{eqn:apply_softmax}). Then, the loss is calculated by applying the weighted cross-entropy loss function, with the weight of each class being the inverse of its relative frequency in the training set data. This way, the predictions for all classes have the same potential influence on the parameter updates, despite the classes not being perfectly balanced.
\begin{figure} \begin{figure}
\begin{align} \begin{align}
......
No preview for this file type
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment