Commit e5914548 authored by burcharr's avatar burcharr 💬
Browse files

automatic writing commit ...

parent c1793f2e
......@@ -50,21 +50,31 @@ To conclude the results of problem 3, the overall performance of this more diffi
### Practical applicability
The data from the real world evaluation with our test subjects shows, that most real world hand washing procedures are detected by our smart watch system. Overall, the system's sensitivity was ... in the evaluation of a "normal day", which is ... compared to the theoretical results. However, this was to be expected, since real hand washing knows many forms and patterns, that are unlikely to all be captured during the explicit recording of training data. Our theoretical results could therefore not be reached in the real life scenario. Because of the smoothing that was applied to the data, at least some consecutive windows must be classified into the positive class, which means that a real hand washing procedure needs to be longer than or around $10\,s$. In practice, it can happen that washing ones hands does take a shorter amount of time, which the system will then not detect properly. All in all, the system was able to correctly detect most hand washing procedures, and is therefore somewhat effective at this task.
The data from the real world evaluation with our test subjects shows, that not all real world hand washing procedures are detected by our smart watch system. Overall, the system's sensitivity was only $28.33\,\%$ in the evaluation of a "normal day", which is much lower compared to the theoretical results. However, this was to be expected to some degree, since real hand washing knows many forms and patterns, that are unlikely to all be captured during the explicit recording of training data. Added to that, the hand washing detection depended on the side of the body, on which the watch was worn, at least for some subjects.
The data also showed, that higher intensity or a longer duration of the hand washing have a positive influence on the detection probability by the model on the smart watch. This seems logical for the longer duration due to the smoothing, but also for the intensity, it can be assumed, that the system can reach higher certainties with high intensity compared to low intensity washing. TODO (check if this holds for final results)
For some subjects, the smart watch application did not work properly, i.e. not start to run in the background as desired, which is why their results could not be included in the reported results. However, it could be possible, that other users' smart watch applications also were inactive for some of the time, possibly missing some hand washing procedures during this time.
Added to that, the system did detect an average of xy TODO false positives per subject per hour. These false positives could lead to annoyances and ultimately to the users loosing trust in the detection capabilities of the system
Because of the smoothing that was applied to the data, at least some consecutive windows must be classified into the positive class, which means that a real hand washing procedure needs to be longer than or around $10\,s$. In practice, it can happen that washing ones hands does take a shorter amount of time, which the system will then not detect properly. It is even enough, if for some period of time in the middle of a washing procedure, the washing intensity is small enough for the model to misclassify it as noise.
Our theoretical results could therefore not be reached in the real life scenario. All in all, the system was able to correctly detect most hand washing procedures, and is therefore somewhat effective at this task.
We also expected, that a higher intensity or a longer duration of the hand washing have a positive influence on the detection probability by the model on the smart watch. This seems logical for the longer duration due to the smoothing, but also for the intensity. It can be assumed, that the system can reach higher certainties with high intensity compared to low intensity washing, as it is likely more separable from less intense activities. However, the results showed a significantly positive correlation value only for intensity and detection rate, whereas the detection rate and hand washing duration seemed to be mostly uncorrelated. However, this may again be due to the relatively small sample size. Especially for the longer washing tasks of 30s and 35s, there were only 2 examples, out of which one was not detected. This may have had a big influence on the absence of a positive correlation value in the evaluation results.
Added to that, the system did detect an average of 4 false positives per subject per day. These false positives could lead to annoyances and ultimately to the users losing trust in the detection capabilities of the system. However, the amount found here in the everyday task also varied a lot from subject to subject. Mainly, washing activities lead to false positives, which was to be expected, because similar movements like in hand washing are executed. Other activities also lead to false positives, which also confirmed the theoretical results' high, but not very high specificity.
The test of scenario 2, the task of intensively washing for at least 30 seconds, yielded a lot higher accuracy. Per subject the washing was on average detected in $76\,\%$ of washing repetitions. Compared to the sensitivity of $90\,\%$ reached for problem 1 with smoothing, this is only lower by $14$ percentage points. The discrepancy here is much lower than in the every day scenario. This could be due to the fact that the training data for hand washing procedures was also collected in a more controlled environment, and more similar patterns were achieved. The results of the evaluation for scenario 2 are thus better than the results for scenario 1.
In total, the practical evaluation showed some weaknesses and some strengths of the system. As the sample size is small, and system instabilities occurred, the results have to be interpreted carefully. The evaluation is valid, especially for the false positives and the activities provoking them. However, the low sensitivity found in the every day task does not match the much higher sensitivity found in the intensive hand washing task, and the differences between subjects were huge for scenario 1.
## Comparison of goals to results
#### Detection of hand washing in real time from inertial motion sensors
The goal of detecting hand washing and separating it from other activities in real time was reached by employing the trained DeepConvLSTM-A network which achieved good performance in our evaluation. The detection is not perfect yet, especially the separation from other activities seems to still have some weaknesses, especially when washing activities other than hand washing are included. However, the system was able to detect and correctly identify hand washing in most of the cases, which is why we consider our goal reached.
The goal of detecting hand washing and separating it from other activities in real time was reached by employing the trained DeepConvLSTM-A network which achieved good performance in our theoretical evaluation. The detection is not perfect yet, especially the separation from other activities seems to still have some weaknesses, especially when washing activities other than hand washing are included. The system also missed out on too many of the hand washing procedures executed in the real world evaluation. However, the system was able to detect and correctly identify hand washing very well in the theoretical evaluation, and in many of the cases in the practical evaluation, which is why we consider our goal mostly reached.
#### Separation of hand washing and compulsive hand washing
The separation of hand washing from compulsive hand washing worked extremely well for the theoretical evaluation, which is the only evaluation we were able to test it with. A sensitivity of $99,7\,\%$ was reached with smoothing, while maintaining a specificity of $83,9\,\%$. This means that almost all compulsive hand washing in our test data was detected by the system, although the false positive rate is still a bit higher than we want it to be. Nevertheless, the performance of the model trained for this problem was really strong and fully matched our expectations. We think that a performance on this level in the real world could possibly really be applied in the treatment of patients with OCD, which is why we consider this goal as reached, too.
#### Real world evaluation
The real world evaluation provided us with valuable feedback, showing us strengths and weaknesses of the hand washing detection model. Especially for the previously mentioned washing activities, the evaluation showed us the need of their inclusion as negative training examples. Apart from the false positives, the real world evaluation confirmed our hopes of the system actually being able to detect hand washing with high precision. Although the performance in the real world test was lower than the one of the theoretical evaluation, the real world evaluation still yielded a strong performance (TODO: check if true). The estimation of performance for the intense and long washing task (scenario 2) was much closer to the performance reached on our pre-recorded test set, which showed that the system is able to detect stronger hand washing even better. Overall, the real world evaluation was really successful, returning us valuable information about the weak points and strengths of the system so far.
The real world evaluation provided us with valuable feedback, showing us strengths and weaknesses of the hand washing detection model. Especially for the previously mentioned washing activities, the evaluation showed us the need of their inclusion as negative training examples. Apart from the false positives, the real world evaluation confirmed some of our hopes of the system actually being able to detect hand washing with high precision. Although the performance in the real world test was lower than the one of the theoretical evaluation, it worked well for some of the subjects, and showed us the main weaknesses of the system. The real world evaluation still yielded a strong performance especially in the task of scenario 2. The estimation of performance for the intense and long washing task (scenario 2) was much closer to the performance reached on our pre-recorded test set, which showed that the system is able to detect stronger hand washing even better. Overall, the real world evaluation was successful, returning us valuable information about the weak points and strengths of the system so far. In total, the amount of subjects and feedback received was a little bit too small, in order to draw fully qualified conclusions, as the variance of results was high between the subjects.
## Future work
......@@ -77,26 +87,27 @@ Added to that, all the other hyperparameters could be optimized better. Instead
The current state of the system, especially for the classification of hand washing versus compulsive hand washing class looks promising for future work in this area. The collection of real obsessive-compulsive hand washing data would likely lead to the possible training of models capable of reliably classifying compulsive hand washing. Such models could then be tested on real world subjects, and also evaluated with them. If they perform well enough, they could aid psychologists and their patients with the treatment of compulsive hand washing. Like explained in the introduction, exposure and response prevention (ERP) is a viable treatment method, and interventions from a smart watch could possibly be used for response prevention. The exact design of the interventions and their actual usability forms another exciting problem field and is yet to be researched.
More data could also be incorporated for the negative class, because more different activities should be included in the data. While the standard movement activities of walking, jogging, sitting, walking up and down stairs and some fitness activities were already included for this work, more special activities have not yet been included, possibly leading to the increased false positive rate in the real world scenario.
The hand washing detection should also work well on both wrists. Multiple solutions for the differences occurring between the two sides could be tried. One could separately train two models, each for one of the wrists. The downside of this the system would also need to figure out, on which wrist it is worn, either automatically, or by user input. This leads to some extra uncertainty. Another idea would be to just train a model on balanced data from both wrists, leading to a model that can possibly implicitly learn, which wrist the watch is worn on. No matter how we solve this problem, it seems like the watch position on the body must be accounted for in some way, possibly also needing more data, or specific position labels for the existing data.
To avoid false, positives, one could also try to do detection of out of distribution movements, similar to the HAWAD approach that we discussed. The application of this method must be carefully done, as we do not certainly know that all out of distribution samples are no hand washing. The applicability of this method needs to be tested thoroughly.
More data could also be incorporated for the negative class, because more different activities should be included in the data. While the standard movement activities of walking, jogging, sitting, walking up and down stairs and some fitness activities were already included for this work, more special activities have not yet been included, possibly leading to the increased false positive rate in the real world scenario. Although we already include day long recordings of long term, i.e. every day activity data, the data set would quickly become huge if we would include more of these. As a result, it is likely necessary to manually record and include every day activities, like washing plates or pans, cleaning, brushing teeth and more. It would be even more desirable to have access to a whole database of human activities and their recordings from body worn sensors, to be used as negative examples in the training of a hand washing detection model.
The most important part of the future work in this area, especially for the detection of compulsive hand washing, will be the application to the real world with actual patients suffering from OCD with compulsive hand washing. Only on their data we will be able to properly train models, and only with them we will be able to properly evaluate the developed models, in order to gain a certain estimate of our performance. With real patients, it could also be a good idea to try and fit the model to each patient dynamically. The idea would be to start with a pre-trained model, which was trained on available data of many subjects. Afterwards, for each patient, data could be collected, and used to re-train the model. This approach of re-training pre-trained neural networks is often applied in computer vision, and has shown promising results there.
To avoid false, positives, one could also try to do detection of out of distribution movements, similar to the HAWAD approach that we discussed. The application of this method must be carefully done, as we do not certainly know that all out of distribution samples are no hand washing. The applicability of this method needs to be tested thoroughly.
The real world evaluation results show that, more data of other activities has to be included, e.g. of people doing the dishes, as there were many false positives in this area.
TODO
The most important part of the future work in this area, especially for the detection of compulsive hand washing, will be the application to the real world with actual patients suffering from OCD with compulsive hand washing. Only on their authentic data we will be able to properly train models, and only with them we will be able to properly evaluate the developed models, in order to gain a certain estimate of our performance.
With real patients, it could also be a good idea to try and fit the model to each patient dynamically. The idea would be to start with a pre-trained model, which was trained on available data of many subjects. Afterwards, for each patient, data could be collected, and used to re-train the model. The collected data could be sensor recordings and user feedback, which would be subject specific and fit the subject's personal style and patterns of hand washing. The training would then also stay subject specific, and the model could be updated dynamically to get more and more precise with more data from that specific subject. The approach of re-training pre-trained neural networks is often applied in computer vision, and has shown promising results there, when transferring from one problem to another, and could possibly also work in our scenario.
All in all, a lot of future work could be done in the area of hand washing detection, especially for the detection of obsessive-compulsive hand washing.
# Conclusion
In this work, we described the development, training and evaluation of a powerful and accurate compulsive and non-compulsive hand washing detection system. The relevance of such a system was explained with its applications in the field of hygiene compliance enforcement (general hand washing), as well as in the field of possibly helping in the treatment of obsessive compulsive disorder with compulsive hand washing.
We theoretically evaluated different designs of neural networks on three related problems of hand washing detection, including the separation of hand washing from other activities, the separation of hand washing from compulsive hand washing and the separation of hand washing from compulsive hand washing and from other activities at the same time. For this task, we used hand washing data, data of simulated compulsive hand washing, and data of other activities which was collected from publicly available data sets. After training and evaluation, we selected the best functioning system based on several metrics, including the F1 score and the harmonic mean of sensitivity and specificity, which we called S score. The dominating models, DeepConvLSTM and DeepConvLSTM-A were both based on a deep convolutional neural network joined with an LSTM layer. For DeepConvLSTM-A, which performed slightly better than DeepConvLSTM, we added an attention mechanism, in order to allow the model to flexibly focus on more relevant sections of its input. The designed models were able to beat baselines such as a random forest classifier and a support vector machine, as well as chance level baselines by a large margin.
In a practical evaluation using x subjects (TODO), we tested DeepConvLSTM-A on the hand washing detection task in a real world and every day environment, as well as in a fixed schedule hand washing test. The system ran on a smart watch, which was used to monitor the users wrist movements in real-time and was able to correctly detect hand washing ... . Some false positives appeared for different activities, many of which were washing related.
In a practical evaluation using 5 subjects, we tested DeepConvLSTM-A on the hand washing detection task in a real world and every day environment, as well as in a fixed schedule hand washing test. The system ran on a smart watch, which was used to monitor the users wrist movements in real-time and tried to correctly detect hand washing. The accuracy of this test was lower than expected ($28,33\,\%$). Some false positives appeared for different activities, many of which were washing related, which must be ruled out in the future.
In the second test of the practical evaluation, subjects performed intensive and long hand washing repetitions, which were more easy to detect. The systems performance here was ... (near theory?? TODO).
In the second test of the practical evaluation, subjects performed intensive and long hand washing repetitions, which were more easy to detect. The systems performance here was much closer to our the results of the theoretical evaluation of our models (sensitivity $76\,\%$ vs $90\,\%$).
Hence, the evaluation results suggest that the developed system is able to properly detect hand washing in many cases. The specificity and sensitivity of the system is high, but leaves some room for improvement.
In conclusion, the application of wrist worn sensor data to the detection of hand washing and compulsive hand washing remains an interesting and open field of research, with many possible areas of application. Especially the detection of obsessive hand washing would be a world's first, and seems promising for future usage in the treatment of OCD patients
In conclusion, the application of wrist worn sensor data to the detection of hand washing and compulsive hand washing remains an interesting and open field of research, with many possible areas of application. Especially the detection of obsessive hand washing would be a world's first, and seems promising for future usage in the treatment of OCD patients. Due to the possibility of directly running neural network models on wrist worn smart watches, interventions could be generated in real time and with low latency.
......@@ -99,30 +99,31 @@ The mean diagonal value of the confusion matrix upholds almost the same ordering
## Practical Evaluation
### Scenario 1: One day of evaluation
In the first scenario, the X (TODO) subjects reported an average of .. hand washing procedures on the day on which they evaluated the system.
Per subject, there were (+- ) hand washing procedures. Out of those, .. (+-) were correctly identified ($xy\,\%$).
In the first scenario, the 5 (TODO) subjects reported an average of $4.75$ hand washing procedures on the day on which they evaluated the system.
Per subject, there were $4.75$ ($\pm 3.3$) hand washing procedures. Out of those, $1.75$ ($\pm 2.06\,\%$) were correctly identified. The accuracy per subject was $28,33\,\%$ ($\pm 37.9\,\%$). The highest accuracy for a subject was $80\,\%$ out of 5 hand washes, the lowest was $0\,\%$ out of 4 hand washes.
The duration and intensity of the hand washing process also played a role, as can be seen in todo fig.
The highest accuracy was achieved for ....
The lowest accuracy was achieved for ....
Some subjects wore the smart watch on the left wrist instead of the right wrist, and reported worse results for that.
The correlation of duration of the hand washing with the detection rate is ...
Likewise, the correlation of the intensity of washing with the detection rate is ... TODO
The duration and intensity of the hand washing process also played a role.
The correlation of duration of the hand washing with the detection rate is $-0.039$. However, the raw data does only contain 2 "longer" hand washes over 30 seconds, the rest being in the range of 10 to 25 seconds.
Added to that, the correlation of the intensity of washing with the detection rate is $0.267$.
For the reported false positives, the subjects experiences varied. The subjects reported xy (+-) false hand washing detections on this day. Assuming a 12h recording period, that means there are xy / 12 false detections per hour.
For the reported false positives, the subjects experiences varied. The subjects reported $4$ ($\pm 5.19$) false hand washing detections on this day. The minimum was 0 false positives, and the highest was 13 false positives.
The activities leading to false positives include:
- Changing clothes (or helping others to do so)
- Washing pans / doing the dishes
- Scratching oneself
-
- TODO
- Brushing teeth
- Cleaning
The full list of reported activities can be found in the appendix. TODO
The full list of reported activities can be found in the appendix.
Some subjects also reported difficulties with the smart watch application (not part of this work), which lead to the model not being run at all sometimes, which might also have influenced the results. It could be possible, that for some hand washing procedures, the smart watch application was not executed, which would lead the user to note down a false negative, also decreasing the sensitivity in the results.
### Scenario 2: Controlled intensive hand washing
In scenario 2, the subject each washed their hands at least 3 times. Some subjects voluntarily agreed to performing more repetitions, which leads to more than 3 washing detection results per subject. The detection accuracy per subject was xy (+-) $\,%$, with the highest being, xy and the lowest being zy.TODO.
The total mean accuracy over all repetitions was xy %.
In scenario 2, the subjects each washed their hands at least 3 times. Some subjects voluntarily agreed to perform more repetitions, which leads to more than 3 washing detection results per subject. The detection accuracy per subject was $76\,\%$ ($\pm 25\,\%$), with the highest being, $100\,\%$ and the lowest being $50\,\%$.
The mean accuracy over all repetitions and not split by subjects was $73,7\,\%$. For scenario 2, one user moved the smart watch from the left wrist to the right wrist after two repetitions. The first two repetitions were not detected, while the two repetitions with the smart watch worn on the right wrist were detected correctly.
......@@ -3,6 +3,7 @@
{} & \textbf{{frequency}} \\
\textbf{activity } & \\
\midrule
\textbf{Abwaschen / Kochen } & 1 \\
\textbf{Aus dem Auto ausgestiegen und ca 5m gelaufen} & 1 \\
\textbf{Dinge einsammeln und wegräumen } & 1 \\
\textbf{Fleischkäse zum Schneiden auf Brett gelegt } & 1 \\
......@@ -10,10 +11,13 @@
\textbf{Kind Jacke anziehen } & 1 \\
\textbf{Kind umziehen } & 2 \\
\textbf{Küche putzen } & 1 \\
\textbf{Laufen } & 1 \\
\textbf{Pfanne abwaschen } & 1 \\
\textbf{Schreiben } & 1 \\
\textbf{Schuhe anziehen } & 1 \\
\textbf{Tisch abwischen } & 1 \\
\textbf{Windel wechseln } & 2 \\
\textbf{Zähne putzen } & 1 \\
\textbf{abwaschen } & 1 \\
\textbf{gekratzt } & 1 \\
\textbf{kochen } & 1 \\
......
No preview for this file type
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment