The situation is that the software product will send an email to someone when there is an error. Many errors can be reported once a day because the software is working fine for most tasks but didn’t have what it needed for a moment. Many errors need to be reported immediately because they are serious.
Now what happens if there is a serious error that happens over and over at a high rate of occurrence? Is it appropriate to send an email every few seconds to report the error?
The proposed solution is to keep a record of all recent errors, maybe in a reasonable sized list, and if an error was reported recently, don’t report it again.
Now for the down-side to that solution; If the problem is fixed and then happens again, no email will get sent for the second failure until the time period has expired. A real world example would be a situation where a file is needed and is missing. The software continuously check for the file so that it can run properly as soon as it is available. What if the missing file is reported and is then provided without the software every being reset in any way. Then when the file disappears again a few moments later, no error is sent.
The solution is not to track errors but to track program state. In other words, an error condition is a program state and when a repeating error stops happening, the list of errors needs to be updated to exclude that error.
I have not written code to do this. My project timeline only allowed me enough time to stop repeating error messages and a short timespan was picked so that the new occurrence of an error would just get detected in a reasonable amount of time. 15 minutes seemed appropriate for our software. This keeps the system from sending out thousands of emails while also providing a quick enough expiration for already sent errors so that a new problem is not hidden from the user.
Implementation Idea
My idea is to create a class of objects to maintain program states. This would be like a Boolean variable but with much more functionality built in and without requiring a separate “global” variable for each state. A list will be used and it will be specific to the state handling system. Maybe it would not look like a list on the outside. A piece of code can then access a state by number, typically the error number used to report an error, and can then just call a function to set the state to a good or bad condition. The state list code would handle deciding if the condition change should cause an error to be reported through an email.
I may expand on this idea later.