JeremyNicoll

A weird GUI problem

Recommended Posts

Using v11.5.0.6191

A few days ago (while conducting experiments with v11.5.0.6191 and rule deletion, to see
if I could recreate the problem of ping/tracert suddenly not working) I had something odd
happen in the GUI.   I've read some other reports of odd behaviour and wonder if this
is related.

I'd shut it, then (by right-click on the systray icon & selection of 'Overview') reopened
it.  It came up with the lefthand grey box saying "Initializing..." & a grey/white moving
band moving from left to right on the bottom border of the grey box. What? There was also
a rotating progress arrow at the top right of the GUI, just under the X that one would
click to shut the window.

The Initializing grey box also said 'Last Update  1 Hr ago'. That made me wonder if there
was a stalled newer update.

See screenshot.

I clicked the X in the grey box, which changed to say "Cancelling".  Cancelling what?  The
cancel didn't seem to make any difference.

Meantime the bit of the taskbar that represents the EIS GUI had a green stripe that passed
across it every few seconds...

I then tried systray icon - shutdown all protection - which seemed to stop the GUI ok, then
double-clicked the EIS desktop shortcut which seemed to restart it.  But it still said it
was Initializing...

I then rebooted.  After that I opened the EIS GUI, which no longer said Initializing.  The
last update was a while back so I clicked 'update now'.  It then said Initializing... again,
but only for a couple of seconds before going ahead and doing an update without any trouble.

post-25439-0-79479600-1457869545_thumb.jpg
Download Image

  • Upvote 1

Share this post


Link to post
Share on other sites

The box in question displays update status and the update controls. You are correct that it was starting an update (it is possible it had stalled for some reason), and when you clicked the 'X' button that canceled the update process. The green you saw on the taskbar icon was showing you that it was attempting to start the update process, and it wasn't solid yet because it hadn't actually started the download yet.

If I remember right, you don't have any other security software installed, correct?

Share this post


Link to post
Share on other sites

No other security software, correct.    Clicking the X didn't cancel the stalled(?) update process.   I can't remember how long I

waited to see if it took effect, but it will have been at least a minute, and teh restarted GUI didn't seem to have noticed the

cancel either.

Share this post


Link to post
Share on other sites

That's usually related to issues accessing/writing to %TEMP%\a2temp, and if it happens again you may want to start the computer in Safe Mode and try deleting that folder.

Share this post


Link to post
Share on other sites

These ... issues ... would be on behalf of EIS?  What could suddenly prevent it from reading
or writing to a folder within the current-user's %TEMP% directory?   Is there an underlying
Windows problem with access to that?    And, why Safe Mode?  I'm able to rename/delete
%TEMP%\a2temp  normally, if I try to.

Share this post


Link to post
Share on other sites

These ... issues ... would be on behalf of EIS?

EIS needs to be able to read from and write to that folder in order to temporarily store files when downloading updates. If something prevents it from reading from or writing to that folder, then the update process will usually hang/freeze.

What could suddenly prevent it from reading or writing to a folder within the current-user's %TEMP% directory?   Is there an underlying Windows problem with access to that?

Filesystem issues, other security software, disk controller driver issues, hardware problems, etc. Anything that can interfere with read/write operations can cause the problem. As long as deleting the folder fixes the issue, then there shouldn't be anything to worry about. If it keeps randomly reoccurring, then it might warrant further investigation.

And, why Safe Mode?  I'm able to rename/delete %TEMP%\a2temp  normally, if I try to.

It is possible that if EIS is trying to download updates while you attempt to delete the folder, that it will fail to delete. Starting in Safe Mode prevents anything from blocking deletion of files/folders like that, since usually only a few essential system processes are running in Safe Mode.

Share this post


Link to post
Share on other sites

OK, all those answers are fair enough from a theoretical point of view.  But I have no
reason to suppose that there's any reason why EIS couldn't read/write its own folder in
%TEMP%, when it seemed to have a stalled update.

If I had disk controller / other hardware issues, why would they only affect EIS's temp
folder?  And I don't have other security software.

I've seen suggestions to other users that the a2temp folder should be deleted, when they
appear to have had update problems.  To me that suggests there's an underlying problem
with that folder.  Maybe EIS should create a new one, eg a2temp-yyyymmddhhmmss, each time
it needs to create a set of temporary files?  And (usually) having created its latest one
try to delete the older such folders?

Share this post


Link to post
Share on other sites

... But I have no reason to suppose that there's any reason why EIS couldn't read/write its own folder in %TEMP%, when it seemed to have a stalled update.

If I had disk controller / other hardware issues, why would they only affect EIS's temp folder?  And I don't have other security software.

You're correct, there's no way to know exactly what happened without being able to reproduce it and collect debug information. These are all just possibilities.

I've seen suggestions to other users that the a2temp folder should be deleted, when they appear to have had update problems.  To me that suggests there's an underlying problem with that folder.  Maybe EIS should create a new one, eg a2temp-yyyymmddhhmmss, each time it needs to create a set of temporary files?  And (usually) having created its latest one try to delete the older such folders?

If there's a reason why EIS can't read from or write to the a2temp folder, then there's no reason to believe it would be able to delete the folder. I'm not actually certain what kind of error handling we have in place for issues with the a2temp folder. The subject of error handling to prevent the update process from freezing has been brought up in the past, and I know that a number of changes have been made related to update stability since EIS 11 was released, however only the developer who wrote the code knows for certain how it works. Clearly there's still some room for improvement, so if I get an answer to the last question I asked them (from your other topic), then I'll ask about this as well, since I don't like to try to ask too many question at once (it gets a bit confusing).

As for creating a new folder each time, that is possible, however it would eat up TEMP space rather quickly. It also might not prevent these issues, depending on what is causing them. After all, it's just a folder, and if EIS can't read from or write to one folder in TEMP then there's no guarantee that it can create a new folder in TEMP and be able to read from or write to that new folder.

Share this post


Link to post
Share on other sites

Sorry for delay in replying; hit a bad patch in my chronic illness.

> If there's a reason why EIS can't read from or write to the a2temp folder, then there's
> no reason to believe it would be able to delete the folder.

Indeed, but assuming that the update process & use of a2temp has worked for a particular
machine previously, it seems to me that the only likely reason for a one-off problem is
going to be either a change of ownership or that some other application program or bit of
the OS has the folder in use.  Creating a new folder for a specific set of updates would
at least prevent that issue.  The failure to delete old folder(s) would still need to be
reported to the user but would be less likely to prevent the app being kept uptodate.

> ... then I'll ask about this as well, ...

Thank-you.

> As for creating a new folder each time, that is possible, however it would eat up TEMP
> space rather quickly.  It also might not prevent these issues, depending on what is
> causing them. After all, it's just a folder, and if EIS can't read from or write to one
> folder in TEMP then there's no guarantee that it can create a new folder in TEMP and be
> able to read from or write to that new folder.

Absolutely, but it might reduce the incidence of failed updates.  If EIS were suddenly
unable to create a subfolder in TEMP, it's not unreasonable to think that other programs
would have the same problem... that is, it would be a global problem with TEMP rather
than a problem with EIS.      

Share this post


Link to post
Share on other sites

... If EIS were suddenly unable to create a subfolder in TEMP, it's not unreasonable to think that other programs would have the same problem... that is, it would be a global problem with TEMP rather than a problem with EIS.

That is quite true, and we have seen that in the past as well. In this case I think whatever happened was temporary, and could have even just been a case of bad timing (something else was happening that just so happened to prevent overwriting files in %TEMP% as the files were being accessed).

The official answer I got is that, in cases where EIS can't write to the a2temp folder, it will simply display an error message stating that it can't connect to the update server (a little cryptic, but perhaps less scary to the average user). I was also told that this particular case may not be related to writing to the a2temp folder, but may be a different issue. We'd need debug logs to know more, but if it doesn't happen again then that won't be possible.

Share this post


Link to post
Share on other sites

OK... leaving aside the possibility that I/O to TEMP was somehow not possible, has anyone
thought about why clicking the X (which did cause the GUI to change to say 'Cancelling')
didn't cancel anything?


And another thing... debug logs.  It's great that EIS allows one to turn debug logging on
and off dynamically - no more flipping a bit in the registry and then rebooting - but (as
you must be aware) so often a problem happens and then one can't reproduce it, so can't
collect useful logs.

I've used systems in the past that cache (I guess in a queue or maybe deque structure) a
certain amount of log/trace activity all the time, in RAM.  When a problem happens, that
cached data gets written to disk; if the problem is very serious at least that trace data
is visible in the core dumps.  I think it would be useful if EIS had some similar facility,
perhaps configurable (maybe in terms of the size of the in-store cache), and there was a
"dump it now" button in the GUI.  I'd expect the app to dump it itself if it knew that
something wasn't right, but otherwise users who'd just experienced something odd could
click the button and at least there'd be a chance that something useful would have been
written out.  Some people at least (certainly me, with a powerful machine) would be very
happy to accept the (I expect) tiny CPU & RAM overhead of such a thing, if it lead to
easier resolution of bugs.  (I've often run other apps which create logging /files/ with
detailed logging on all the time and archived weeks or months' worth of those logs, just
for this sort of reason.)

Share this post


Link to post
Share on other sites

OK... leaving aside the possibility that I/O to TEMP was somehow not possible, has anyone thought about why clicking the X (which did cause the GUI to change to say 'Cancelling') didn't cancel anything?

The thread that was processing the update check probably got stuck trying to do something, and thus wasn't responding to the request to stop.

As for caching debug information in RAM, it would cause performance issues, which is one of the reasons why the logging is disabled by default. We do have certain automated debugging mechanisms, however they are only for when one of our programs crash, and just that much causes a decrease in performance. Before version 11.6.0.6267 you may have noticed that the EIS window wouldn't appear right away when you tried to open it, and this was actually caused by the system that collects crash debug information and sends it to us. Without that crash debug system, the EIS window would actually open almost instantly. Now we cache a2start.exe in RAM on startup to make it appear to open instantly, however that wouldn't work with regular debug logs as they include so much debug information that it can quickly grow into the gigabytes.

Share this post


Link to post
Share on other sites

> The thread that was processing the update check probably got stuck

Yes of course, but what I was asking is "has anyone thought about which thread and
how it got stuck and why it couldn't be interrupted/terminated/whatever by your
'cancel' process.  I mean, when I terminate a process in Process Hacker, it's very
rare for that not to work.


> cacheing debug info in RAM

When I have had debugging turned on, frankly I have not noticed a performance problem,
though I expect that depends a lot on the level of other activity on my machine, & so
far when I have had debugging on I've been concentrating on trying to reproduce an EIS
issue and not had much else happening at the same time.  But my machine is almost
always lightly loaded.  I have the feeling that the cpu overhead of running with debug
on all the time would either never bother me, or only rarely.  I'm going to turn on
that logging and keep it on for a few days and see if my view on this changes...
                                                                                    

Share this post


Link to post
Share on other sites

Yes of course, but what I was asking is "has anyone thought about which thread and how it got stuck and why it couldn't be interrupted/terminated/whatever by your 'cancel' process.  I mean, when I terminate a process in Process Hacker, it's very rare for that not to work.

Which thread would be the one a2service.exe creates when it starts checking for updates. Lets take this overly simplified example of a threaded program in Java:

class ThreadExample extends Thread {

	public void run() {

		/* Execute thread code */
		...

	}
}

public class MainProgram {

	public static void main(String args[]) {

		/* Create and launch thread */
		ThreadExample threadToRun = new ThreadExample();
		threadToRun.start();

		/* Continue executing program code */
		...
	}
}
It defines a class called "ThreadExample", then it defines what code to run in the "run" method. The code in the "run" method doesn't change, and neither does the name of the class. This is just how threads work in Java, and it's only one way to do it, but hopefully you can see that a programmer will know what thread is executing what instructions in their code. What thread is freezing/stuck isn't really the issue, it's more a matter of reproducing what's causing it so that our developers can work on fixing it.

As for the cause, there's plenty of possibilities. If you can figure out how to reproduce it, then we can get some debug information.

When I have had debugging turned on, frankly I have not noticed a performance problem, though I expect that depends a lot on the level of other activity on my machine, & so far when I have had debugging on I've been concentrating on trying to reproduce an EIS issue and not had much else happening at the same time.  But my machine is almost always lightly loaded.  I have the feeling that the cpu overhead of running with debug on all the time would either never bother me, or only rarely.  I'm going to turn on that logging and keep it on for a few days and see if my view on this changes...

Logging would be more likely to cause I/O slowdowns, however whether the performance decrease is noticeable at all depends on the computer. Some computers are capable of handling it without any noticeable slowdowns.

Share this post


Link to post
Share on other sites

As far as 'cancel' goes, my question then is: why didn't the hypothetical:

threadToRun.terminate()

work?  And yes, I do realise that in reality your 'cancel' is possibly more
sophisticated than just a terminate call.  But surely it's easy to see in your
code what action clicking on the X button does, apart from announcing that the
cancel is taking effect?  So it should be comparitively easy to look for causes
of the cancel not happening.

"If you can figure out how to reproduce it...".   Well, I can't because I'm just
a poor user who clicked on the cancel button your code provided and it didn't
happen.  I'd have thought the fact that the GUI /did/ acknowledge the cancel, but
then didn't actually manage to cancel anything is (compared with many much more
vague bug reports) quite a good place to start looking for the cause.

Share this post


Link to post
Share on other sites

As far as 'cancel' goes, my question then is: why didn't the hypothetical:

threadToRun.terminate()

work?  And yes, I do realise that in reality your 'cancel' is possibly more sophisticated than just a terminate call.  But surely it's easy to see in your code what action clicking on the X button does, apart from announcing that the cancel is taking effect?  So it should be comparitively easy to look for causes of the cancel not happening.

Since the thread has to update files, termination probably takes the form of changing a variable or something to that effect so that the thread can gracefully terminate without being forcibly ended. That way, if it's in the middle of an I/O operation, it doesn't damage a file or the filesystem by being terminated in the middle of it.

It's also more than likely that the language our developers use handles threads differently. I just used an example in Java because I am familiar with the syntax.

... I'd have thought the fact that the GUI /did/ acknowledge the cancel, but then didn't actually manage to cancel anything is (compared with many much more vague bug reports) quite a good place to start looking for the cause.

The UI is being processed and displayed by a separate program (a2start.exe shows the UI, and a2service.exe processes the updates). It's possible for something the service is doing to freeze, but the UI still be responsive and appear to work normally.

Share this post


Link to post
Share on other sites

> termination probably takes the form of changing a variable ...

You're probably right.  Maybe that isn't tested often enough in the updater code?
Of course if the updater is stalled for some reason, it's never going to get to
the next such test.

Share this post


Link to post
Share on other sites

We do test updating before every build is released, as do our volunteer testers. We actually test all normal functions, however it isn't possible to take every possible scenario into account, so testing can only reveal so many issues in our own test environments.

Share this post


Link to post
Share on other sites

> We do test...

I know you do.  I didn't mean the level of product testing you do; I meant, supposing
that the 'cancel' code sets a flag that the updating thread then examines, that the
number of places in the latter thread where the flag is examined might not be as big
as it could be.  For example, it might examine the flag before and after it acquires
the whole set of update files, so it could terminate right at the start, after all
the files were got, or then run though the actual updating process.  Or it could
also examine the flag in between acquisition of each individual update file.  Or
whatever.  But an inexplicable stall in the process will still stop the thread from
running to the next test flag point, and the complications in terminating the whole
update thread at extra points might be complex, for all I know.

Share this post


Link to post
Share on other sites

Since I'm not familiar with the any the code our developers have written for the update process, I can't say for certain how they handle it.

Share this post


Link to post
Share on other sites

> Since I'm not familiar with the any the code our developers have written for the update process, I can't say for certain how they handle it.

 

Now you say that... ;-)      The thing is, these last few interactions between us have followed your statement about why the cancel probably

didn't work, your inclusion of pseudo-code etc...   So what were you trying to do?   If you don't /know/ how the code works, why try to put me

off with descriptions of what it is doing?  

 

All this followed my question to you of: "has anyone thought about which thread and how it got stuck and why it couldn't be interrupted/

terminated/whatever by your 'cancel' process"...   which seems to me to be a perfectly reasonable question from a customer who was unable

to get an update process to cancel.  It's not as if I invented the cancel button.  Your code provided it, and it was reasonable fo rme to expect

it to work.

 

So, have any of the developers given any thought to why this did not owrk?

Share this post


Link to post
Share on other sites

Now you say that... ;-)      The thing is, these last few interactions between us have followed your statement about why the cancel probably didn't work, your inclusion of pseudo-code etc...   So what were you trying to do?   If you don't /know/ how the code works, why try to put me off with descriptions of what it is doing?

I was just trying to give an example of how threaded programming works. I wasn't trying to explain exactly how our developers have coded the update system.

All this followed my question to you of: "has anyone thought about which thread and how it got stuck and why it couldn't be interrupted/terminated/whatever by your 'cancel' process"...   which seems to me to be a perfectly reasonable question from a customer who was unable to get an update process to cancel.  It's not as if I invented the cancel button.  Your code provided it, and it was reasonable fo rme to expect it to work.

That's the question I was trying to answer. Our developers know what thread handles the update process. As you can see in my code example, it's very easy to find a specific thread in code (even if the example I used is in a programming language our developers don't use).

So, have any of the developers given any thought to why this did not owrk?

No, none of them have replied to my question.

Share this post


Link to post
Share on other sites

OK, just summarising where I think this discussion has got to.  I think I'm waiting for
the developers to answer two questions you've passed to them, namely:

a) (as described in your post, which I see datestamped as 20160318 1128):

   - developer comments on using separate folders for each set of update files acquired
     to reduce chance that a problem with the current single specific 'a2temp' folder,
     maybe from a previously stalled update might somehow interfere with a newer update
     attempt


b) (as described in your post, which I see datestamped as 20160421 0555):

    - developer comments on why cancel didn't work

Share this post


Link to post
Share on other sites

developer comments on using separate folders for each set of update files acquired to reduce chance that a problem with the current single specific 'a2temp' folder, maybe from a previously stalled update might somehow interfere with a newer update attempt

We already have error handling for being unable to write to the TEMP folder. Adding more folders just adds complexity, more things that can go wrong, and more bugs.

developer comments on why cancel didn't work

The comment I received was something to the effect that the issue has been seen once before in testing, but is extremely rare and could not be reproduced.

Share this post


Link to post
Share on other sites

> We already have error handling ... Adding more folders just adds complexity ...

Hmmm.  Even YOU agreed in your 20160318 1128 post that this area needs more work.  Do the
developers agree that the present situation is not good enough?

Share this post


Link to post
Share on other sites

Hmmm.  Even YOU agreed in your 20160318 1128 post that this area needs more work.  Do the developers agree that the present situation is not good enough?

I wouldn't expect them to think anything about it. There's no debug information for them to look at.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    No registered users viewing this page.