JeremyNicoll

Member
  • Content Count

    1818
  • Joined

  • Last visited

  • Days Won

    28

Posts posted by JeremyNicoll


  1. > If you could start an update while a previous update appeared to be frozen/stalled, then
    > the previous update wasn't actually stalled and it may have just been a UI glitch.

    A three-day failure to update its signatures, AND no decent alert to the user saying that
    there was a problem is FAR MORE than a 'UI glitch'.  It's a serious fault.

    I note that in another thread someone else recently complained that the tiny systray shield's
    colour-change is just not adequate, especially for older eyes on high-res screens - as a way of
    notifying the user of a serious issue.
    [http://support.emsisoft.com/topic/20090-eam-shows-no-protection/]


    > It would also double the number of rows in the update log table of the SQLite database, which
    > would cut the number of updates that get logged in half.  This is the sort of thing that we
    > generally keep in debug logs rather than in regular logs.

    ... which would be fine if anyone ever had debug logging on when this occurs.  Debug logs
    are no good for things that users can't reproduce.  You must know that.


    > [killing stalled updates] That wouldn't solve the problem, and the user would be in the same
    > bad situation. No updates, and no clue why.

    Not correct.  I see you snipped my suggestion that when such a 'kill' is done, it would be
    logged.  So while, yes, it wouldn't fix the problem - but then no-one seems to understand
    how to fix the problem - it would not the leave the user without any clue.  There would be
    a log entry saying that the stalled update had been detected.  At least we'd then all know
    that these presumed stalled updates were really happening.


    > Debug logs are the way that developers find out where a problem in the software is. Without them,
    > they have no way to know why something isn't working as expected. They can't just go through
    > millions of lines of code and hope they stumble upon the part of the code that is causing the
    > problem, and just so happen to notice an issue with that code that they overlooked when they
    > wrote it/updated it. The only time bugs get fixed without debug logs is when our QA team can
    > reproduce the issue in their own testing.

    Then you have a fundamental design problem, don't you?  Debug logs are not "the way", they're
    just one way of finding out why things don't work.  There needs to be a middle ground, where
    when users never have debug logging on when a problem occurs, developers have to find another
    way to collect basic info - maybe not enough to pin down exactly why something is going wrong,
    but enough to /prove/ that it is.

    I lead a programming team before illness forced me to stop work.  We occasionally had this sort
    of problem, and fixed it with a basic trace that ran all the time in whichever specific part of
    our product until the reported problem happened again.  We were lucky (compared with you) that our
    customers were on a corporate network and so we were able to have our product send those small
    log files to us automatically and we received and filtered them automatically until one showed
    what we needed to know.  Obviously you can't, in these days of angst about apps that 'phone home',
    have EIS send logs to you without the customers knowing, but it should be possible to create (if
    you like) an extra log that records some small details about the start and stop of the update
    processes.  If you're so concerned about its impact on the SQLite databases put it somewhere
    else.  It has to be active all the time so that it's active when people hit the problem.  But for
    goodness sake, do something about this problem.


    > Obviously, if our management thinks that adding some of your suggestions is a good idea, then
    > they will have our developers work on it.

    Do your management know about any of my suggestions?


  2. Periodically in the last couple of weeks, I've cross-checked Process Hacker's figures with that of Task Manager (in W8.1), and here, they've always agreed with each other.

     

    It's interesting that Yilee sees the problem on W7 Ultimate (on a desktop?) and on a laptop (precise OS unstated above, maybe W7?), and I on W8.1, both of us with 64 bit

    systems. 

     

    Also, my machine has "activate memory usage optimization" turned on, and has done ever since I installed EIS on 28th Feb 2016 (I looked back to my install notes where

    I'd noted the settings I chose to start with).  So if that setting is relevant, it was harmless until the reported start of this problem. 

     

    I did a full shutdown & reboot again over the weekend, but this time turned debug logging on just before I shut the system.  Possibly later today, depending on how high the

    memory use has climbed to (it's doing it steadily but I've not been awake so much in the last few days), I'll shutdown & reboot and then make the debug logs for the whole

    time: boot through to shutdown  available to you.


  3. OK, just summarising where I think this discussion has got to.  I think I'm waiting for
    the developers to answer two questions you've passed to them, namely:

    a) (as described in your post, which I see datestamped as 20160318 1128):

       - developer comments on using separate folders for each set of update files acquired
         to reduce chance that a problem with the current single specific 'a2temp' folder,
         maybe from a previously stalled update might somehow interfere with a newer update
         attempt


    b) (as described in your post, which I see datestamped as 20160421 0555):

        - developer comments on why cancel didn't work


  4. OK... your first & second answers make it clear that you think there was a stalled update
    preventing any other automatic updates from starting.  But you then say that a manual
    (check for) updates would be impossible - it would just show the current one's progress.

    But right at the start of this discussion I told you that I triggered a manual update,
    which worked immediately.  To me, that suggests that there wasn't a stalled update, but
    instead a scheduling problem.

    This is why I keep suggesting that a log entry saying that a scheduled update is about to
    start, would be a good thing.

    If stalled updates really do block other attempted updates from starting maybe your suite
    needs deliberately to look for update processes that started - say - more than 3 hours ago,
    and kill them.  And, if it did do that, LOG that it found such a problem and tried to fix
    it.  Unless the developers start trying to find where the problems lie, I don't think this
    will ever get fixed.

    I've been an EMSI customer for quite a while, and I've read loads of threads which boil
    down to updates not happening when people expected them to.
    Constantly being told that 'debug logs' are needed to solve the problem just isn't good
    enough. We all know that nearly no-one has debug logs active when problems occur.  I'm
    tryin to suggest things that might help pin down where the problems are.


  5. > You have a lot of non-standard settings

    I'm not sure that I do, as not everything we have talked about has yet been translated into actual
    settings changes.  I tend not to change stuff until/if I know I can devote some time in the
    following days to watching the effects of those changes and/or finding ways to test their impact.

    Since the problem seems to happen after reboots (from full 'cold' shutdowns), and as I haven't
    touched any setting (apaert from turning off debug logging, and there's been several reboots

    since then), I've copied all the .ini and ini.backup files out of my EIS install folder, used 7zip to

    pack those up, and PMed that 7z file to you.

                                                                               


  6. Yilee:

    > I also could use Sysinternals process Explorer if you believe it would be much better for
    > this situation.

    Task Manager should be fine


    > My physical memory usage is at 53 to 54%. A2servise uses the most memory out of the running
    > processes at an average of 5,800,200 K for working set, peak working set and private working
    > set with commit at around 6,375,656 K.   I have been booted for about 2 whole days.

    Without knowing what's normal on your machine it's hard to be sure, but with no VMs and (as you
    said lower down) 16 GB of RAM, on a system that's only been up two days, having commit charge
    already at 6 GB seems wildly high to me.

    Also are those working set figures for the whole OS?  I just rebooted (because I've shifted from
    one house to another and chose to turn the laptop off completely while doing that) and although
    - again - figures are climbing - working set for a2service here is currently only about 3 MB, not
    5.8 GB.  I just happened to be watching as a set of updates arrived and working set went up to
    about 180 MB while a2service dealt with that, but fell a few minutes later to about 3 MB again.


    > On the other hand, it makes sense that an EAM (anti-malware) type program would be using the
    > most memory.

    It's sensible that it should use RAM if it needs it, but remember that most of Emsi's customers
    won't have systems with 16 GB RAM in them, so it ought to be able to run in much less.   And in
    that blog post of theirs, they said they can cram all those signatures into about 200 MB of RAM,
    so that alone shouldn't be wasting GB of virtual memory.


    > I mention VMware because I believe you mentioned that you were running a VM.

    I'm not.  I used 'vm' and 'vs' in some posts, as abbreviations for 'virtual memory' and 'virtual
    storage'.


  7. Stapp, a 'restart' does a full shutdown followed by a reboot.  If you don't want the reboot to happen, ie you really do want

    your machine shutdown, then it's not a good choice.  /IF/ you have your machine configured to ask for a power-on password

    then you could turn it off at that point, but if you don't then you've still got the same problem  how to turn the rebooted machine

    off completely.

     

    A shutdown is NOT a pretend one if you invoke it with the right options from the command-line's shutdown.exe command, OR

    if on the GUI you shift-click the menu entry rather than plain click it.


  8. > in the case of an update that is frozen you're still going to see evidence of that
    > in the update logs since there will be gaps in the update log entries

    Yes, if that's the cause.  But how do you know that your internal scheduler properly
    scheduled successive attempts?

    Does your automatic scheduler only schedule later attempts if the current one
    completes ok?

    I mean in my situation, was my system likely to have one stalled upgrade that prevented
    the next 30+ being attempted, or did 30+ attempts start and get nowhere?  (This is why
    I think it would be useful to see an attempt strating.)

    And, if automatic attempts are stalled, why would a manual attempt succeed?  Does a
    manual attempt clear any flags or anything before it starts that don't get cleared
    when an automatic attempt starts?


  9. I should have added...  I'd been noting memory use figures every few hours since I first noticed a problem.   As I've said above, I use 'Process Hacker' to monitor all

    sorts of things in my system, and I have it place a small icon on the taskbar showing physical memory use.  It was the increasing (well past normal) display there

    that first lead me to look closely at both virtual & real memory use, and then to see a2service's growing virtual memory problem.    The displays in Process Hacker

    of memory use (and many other values) are similar to those one can get in Task Manager (if you right-click its column titles and turn on lots of columns that are

    normally hidden.    I'm not sure if task manager can also put tiny graphs onto the systray so you can see a sort of summary of eg cpu use, or whatever.

     

    Task manager will however show you a summary of memory use.  The only advice I can really give is that you look to see what values are being shown, and see if

    they are growing.  If they are, you'll have to make your own mind up about how nervous you get as they reach maybe 80%, 90%, 95%... of their maximum possible

    values.  It's hard for you, if you don't know what your system's normal values are.


  10. Yilee, thanks for your thanks...

     

    When I first reported this I was using EIS v11.6.1.6315, and - actually - I'd not noticed that a newer version had come along.  Maybe that's what took

    me by surprise, as described in post #8 above.

     

    I'm not sure that I'd want to abandon signature updates...    I don't know if it's possible to keep getting signature updates but not get executable code

    updates at the same time.  Do signature updates sometimes need revised executables to work?

     

    To be honest I'm surprised that no-one else has reported a problem.  It makes me (andmaybe Emsisoft) wonder why it is that my system has the

    problem.  Then again, maybe very very few users monitor resource use on their machines, and even if they do, might not see EIS as part of the

    cause, if it is.  There's a possibility that EIS is a victim of a problem in the system, not a cause.  Who can tell?

     

    However, for me, when memory use climbs high enough that either task manager or Process Hacker (or the broadly similar Process Explorer tool)

    show that the OS is gobbling up real RAM and at the same time EIS is apparently using a vast amount of virtual memory (and that's running out too)

    it's reasonable to expect a problem sooner or later.  I don't know enough about W8.1 to know how close to total memory exhaustion one can let a

    system get to safely.  If one lets memory use get too high, then there may not be enough to allow the system to do a controlled shutdown (I mean

    if I tell it to close), and if it gets even worse, you would eventually expect a BSOD.. and possible file corruption (as with any other BSOD).

     

    It may be that last night when I eventually rebooted that I could have left the machine for another few hours.  Maybe another whole day.  But if the

    system was going to fall over in a day or so, there was no point in pushing my luck.  I'm sorry I can't provide a more concrete answer.


  11. What OS are they all running?  Are they using any screen automation / macro software (the sort of thing that sends pretend

    mouse-clicks etc to the screen to automate a manual process) - software like AutoHotKey, or autoitscript, or mjtnet's macro

    scheduler etc?  Incautiously written scripts might trigger something like this.  But then again, a reboot would be unlikely to

    fix it.


  12. > It's not abnormal for there to be more memory reserved for an application than it would
    > normally use

    Indeed, that's true.  But that's the difference between 'reserved' and 'committed' pages,
    and the task manager 'Memory' (as opposed to 'details') screenshot shows the commit charge
    was very high.


    > I think it has to do with how Windows managed virtual memory, and wanting to make sure
    > that if memory usage suddenly spikes for a process there is memory reserved for it to
    > ensure that it doesn't try to use more memory than is available.

    I've not read anything (and I've read a lot on VM management in the last two days; see
    below for URLs for a series of illuminating articles, if you've time to look at them)
    that suggests that the OS would reserve vm pages for a process off its own bat.  The
    process has to ask for the pages.



    I hadn't originally understood every aspect of what I saw in the 'Memory' screenshot;
    it showed the non-paged pool using 4.7 GB.  This is vm that's NEVER paged out, ie always
    located in physical RAM and explains why (as well as seeing EIS's Private Bytes going up
    & up why I was seeing real memory use going up & up).  The NP pool contains the kernel,
    OS data structures that must be in RAM so that eg interrupts can be handled, those for
    mutexes/semaphores, paging control tables, etc, and storage acquired by drivers (or I
    suppose any other kernel-state program that asks for NP storage).

    By late last night the size of the NP pool had grown so big (along with EIS's Private Bytes
    which was by then 2.35 GB) that I rebooted.  I was scared that the system would crash in a
    catastrophic fashion if left running.  I took screenshots (attached) of the memory summary
    (from Process Hacker) immediately before and after the reboot (a full 'cold' reboot) and
    it is interesting to see that the NP pool on the rebooted system was using only 119 MB, a
    huge amount less than 4.7 GB!   Something else I noticed when watching these displays

    is that the numbers of NP allocations each second is usually far greated than the number

    of frees.  I don't know if that's normal (though I guess it might be typical of a system with an

    out-of-control growing NP pool).

    EIS's Private Bytes after the reboot was back to the figure of 486 MB, but since then it has
    climbed to 553 MB.  I don't know if it will again climb & climb.

    Something odd happened just before I rebooted.  I'd signed out of my day-to-day userid & in
    as my admin one.  I did this because there was a minor backup I wanted to do from the admin
    id (and as I've once a long time ago experienced a windows hang after a sign-out & sign-in
    I try not to do that when I'm not prepared to reboot if I have to).  Also I wondered if the
    other user would also show high memory use - it did.  Anyway, as soon as that user's desktop
    came up (some apps, eg Dropbox, which run on my daily id don't start there, so it's quicker
    to start) I got an alert from EIS saying it had just done a software update and needed to
    restart its application.  After it had done so I looked at the EIS update logs, because I
    was puzzled.  I saw no signs of a software update having just been issued.  It seemed as if
    it was a pending action from the software update of a few days ago.  Is it right that a user
    session would need an app restart when it has just been logged-in, to activate a software
    change that was issued several days ago, which in any case had already been implemented via
    my day-to-day userid?  The admin userid had not been 'disconnected' - I never do that - and
    in any case the whole system had been rebooted a few days earlier (which is when I started
    recording memory use).  Very odd.


    > If you make a backup of your settings, and then reset everything back to factory defaults,
    > does this issue still happen?

    I'm not going to try that yet.  I'd like to see if we can find out what's grabbing the
    storage.  As it is, the reboot I felt forced to do last night in the interest of system
    (especially FS) integrity, might have lost us the chance to find out, but fortunately
    (or not, depending on your point of view) that looks not to be the case.

    Also... I've not changed any settings in EIS in the last few days, apart from - now before
    two boots ago - having had debug logging on for a while.


    Useful URLs:

    https://msdn.microsoft.com/en-us/library/windows/hardware/hh439648%28v=vs.85%29.aspx
    - a good clear overview of paging, user & system space, page & non-paged parts of latter

    https://blogs.technet.microsoft.com/markrussinovich/2008/07/21/pushing-the-limits-of-windows-physical-memory/
    - there's a 'meminfo' tool mentioned in this that digs details out of the PFN database
      which records what's in the pages etc, but unfortunately it doesn't work on my system;
      it was after I read about this that I found the SysInternals RAMmap and VMmap utilities.

    https://blogs.technet.microsoft.com/markrussinovich/2008/11/17/pushing-the-limits-of-windows-virtual-memory/
    - describes the commit limit - that all /committed/ vs must be backed either by ram or paging file.
    - describes "Private Bytes" fairly accurately (discussion in the comments shows that even Mark R's
      initial description wasn't quite the whole story)

    https://blogs.technet.microsoft.com/markrussinovich/2009/03/10/pushing-the-limits-of-windows-paged-and-nonpaged-pool/
    - describes eg the non-paged pool - areas where the OS and device drivers store their data
      essentially everything that must never be paged out (so will be in real RAM)

    https://msdn.microsoft.com/en-us/library/aa366778.aspx
    - Memory limits for each version of Windows, eg how big can a user address space be?
     

    post-25439-0-48147000-1461140870_thumb.jpg
    Download Image

    post-25439-0-22309100-1461140884_thumb.jpg
    Download Image


  13. > Since I'm not familiar with the any the code our developers have written for the update process, I can't say for certain how they handle it.

     

    Now you say that... ;-)      The thing is, these last few interactions between us have followed your statement about why the cancel probably

    didn't work, your inclusion of pseudo-code etc...   So what were you trying to do?   If you don't /know/ how the code works, why try to put me

    off with descriptions of what it is doing?  

     

    All this followed my question to you of: "has anyone thought about which thread and how it got stuck and why it couldn't be interrupted/

    terminated/whatever by your 'cancel' process"...   which seems to me to be a perfectly reasonable question from a customer who was unable

    to get an update process to cancel.  It's not as if I invented the cancel button.  Your code provided it, and it was reasonable fo rme to expect

    it to work.

     

    So, have any of the developers given any thought to why this did not owrk?


  14. > changing notification timeout

    I don't think those options (Control Panel - Ease of Access - Using computer without a
    display - Adjust time limits - How long should Windows notifications stay open) will be
    relevant (a) because I'm not using the machine without a display, but also (b) because
    they were set to the default of 5 seconds and the pop-ups already stay visible for much
    longer than that (in the absence of mouse activity).


    > We no longer have notifications ...

    People must be mad, then.  It's not your fault as a software vendor if the OS does not
    allow you to change certain elements of the OS without a reboot.  (It's something I'd
    have thought MS would pay more attention to, since it's impossible for businesses to
    run continuously available systems if they keep needing reboots.)  That aside, users
    who complain about alerts ought to be able to discriminate between annoying things
    that maybe can wait, and those that any sane user would want to know immediately.


    > Our update logs already show the start/end time of each update. If an update fails
    > for any reason, then it should be reflected in the logs.

    I think that's not the case.  I think that probably your log records showing each
    update are only written to the log when an update completes successfully.  Maybe
    there's some entries for partial failures.  But they don't have entries describing
    that start of an attempted update, OR your internal scheduler is not starting them
    when it should.  Do you think if my EIS had shown umpteen started updates for the
    "No updates in three days" period I'd have raised this ticket?

    Nevertheless, looking back at what I wrote above I see that I described the lack of
    update activity in the log, but didn't show you a picture, so here it is.  Note no
    log entries at all in the period 7-9 April.
     

    post-25439-0-58268200-1461137411_thumb.jpg
    Download Image


  15. What I would expect is that when a program requests allocation of some (virtual) memory
    the OS would when it satisfies that make sure that it was capable of 'backing' it with
    either real RAM or a slot in the paging file.  After all, when your program places data
    in that memory it has to be stored somewhere.  I would not expect the OS to allocate
    more pages than it could back, so on my machine with 8 GB RAM and a 4.47 GB paging file,
    I expect (apart from a small amount of memory used by one of the graphics cards) there
    to be a maximum of about 7.9 + 4.47 = 12.37 GB of memory available.

    If an app gets virtual storage from the OS and never writes to it then that page will
    never actually get physically swapped to the pagefile, because that would be a waste of
    time - it's got nothing in it - but the OS still has to expect that one day it may need
    to be saved and there has to be somewhere to put it - either real RAM, or a slot in the
    paging file.

    I think that the 2 GB 'Private Bytes' size represents all the pages that have been
    allocated (or just maybe the sum of all the allocated areas, so the sum of the pages
    which contain them would be greater), and that Working Set is the subset of the total
    number of allocated pages which are actually in RAM at the moment.

    For the OS to have allocated 2 GB of virtual storage to EIS, the paging tables must have
    at least 524,288 (ie 512k) entries describing that large number of pages.  That alone is
    wasting a certain amount of (I suspect non-paged) pages of RAM, though it will presumably
    be attributed to the system rather than EIS.


  16. Trying firewall first: I unplugged the LAN cable, went to GUI's Protection ->
    Firewall tab, and unticked 'Activate Firewall'; an Action Centre notification
    did pop-up immediately, but it fades from view about 45 seconds later.  After
    that to realise there's an issue one would need to spot the red cross symbol
    next to the white notify flag systray icon.  The red cross & (if you go into
    Action Centre) detailed message do both clear themselves as soon as the fw is
    switched back on.

    Trying guards: if I turn Surf Protection alone off/on, Action Centre doesn't
    notice (at least not immediately).  If I turn File Guard alone off/on, Action
    Centre doesn't immediately notice.  If I turn BB alone off/on, Action Centre
    doesn't immediately notice.

    If I turn pairs of guards off, Action Centre doesn't notice.

    If I turn all three guards off, Action Centre does notice, immediately.  The
    notification seems to stay visible for quite a while - more than a minute when
    I wasn't using the machine, but it faded soon after once I started typing in
    my notes.  I didn't click on it.  I tried that again; with no keyboard or mouse
    activity the Action Centre notification stayed visible for > 2 minutes, but as
    soon as I moved the mouse (over the EIS GUI where it had been since I unticked
    the third guard) the AC notification faded away.  It did of course leave the
    red cross/white flag icon visible.  And within AC, there were messages about
    both anti-virus & anti-spyware apps.

    After turning the three guards back on I tried again stopping each in turn; &
    again no alert (or red-cross etc) from any single one of them.  If I stopped
    all three, turning any single one back on banishes the AC notification & both
    the a/v and a/s messages within it.

    My first test of the Firewall being turned off suggested that its notification
    faded after 45 seconds, but I repeated this and - like the Guard ones - it fades
    if you let it stay visible for a while, then move the mouse.

    So if one is busy doing something else when the notification pops up, will one's
    mouse movement banish it without you seeing it?  More or less, yes.  Turning off
    the firewall and moving the mouse immediately fades to pop-up away within ten
    seconds.

    I think if one was engrossed in something else, one could miss that.

    But apart form all this, even if the AC alerts are working ok, I don't really see
    why EIS could not generate an alert that needs a positive action taken (ie a click
    on a button) to dismiss it.

    If a guard or the firewall is turned off by the user clicking something in the GUI,
    maybe some users would be slightly annoyed at being told immediately that they'd
    just turned something off.  Personally I'd rather see that, as a sort of confirmation
    that the alerting mechanism was working.  I certainly would want to see it if some
    feature turns itself off.  Updates-wise, I would like to see an alert as soon as
    the period since the last successful update exceeds some threshhold.

    Better than that would be a recurring reminder that something is off or has not worked
    recently.  Perhaps within that alert there could be a configurable value for how soon
    the next alert might be generated - that might satisfy anyone who did not want to see
    recurring alerts, and might be a good reason to display such an alert even after a
    user manually turns something off.  I presume EIS has its own internal scheduler, since
    you don't seem to use Windows' task scheduler, so it should be easy to set up recurring
    checks.

    Of course, if the problem with no updates for days is actually because your scheduler
    hasn't triggered anything for days, that won't help much.

    Would you at least consider adding a log entry when you start an update?  Then at
    least it would be easier to see if no updates happening was a scheduler issue.
     


  17. > We do test...

    I know you do.  I didn't mean the level of product testing you do; I meant, supposing
    that the 'cancel' code sets a flag that the updating thread then examines, that the
    number of places in the latter thread where the flag is examined might not be as big
    as it could be.  For example, it might examine the flag before and after it acquires
    the whole set of update files, so it could terminate right at the start, after all
    the files were got, or then run though the actual updating process.  Or it could
    also examine the flag in between acquisition of each individual update file.  Or
    whatever.  But an inexplicable stall in the process will still stop the thread from
    running to the next test flag point, and the complications in terminating the whole
    update thread at extra points might be complex, for all I know.