On Sunday morning I was able to connect and to my surprise, SQL Monitor had captured a slew of error messages. 10 or 12 of them in a row, all occurring since midnight. My heart skipped a beat. What was going on... some digging ensued.
I feel so luck to have SQL monitor watching my systems, and emailing myself, along with 2 others in the IT department. If something awry occurred, it would let us know, and if I didn't jump on it, at least the other 2 humans would know via email something was up. So, something was up. I found it, and was investigating. Thanks SQL Monitor for watching my DBs while I climbed to Angels Landing in Zion National Park.
What was the error that SQL Monitor was complaining to me about? A nice generic one. 'SQL Server error log entry'. This means that something happened, was logged into the error log, and, well, that's it. Just that that think happened. If you are like me, your heart starts skipping a few more beats. As I looked at each alert down the chain, they were all the same. What I didnt notice at this juncture was that each alert was coming to me via email roughly and an hour interval. This is valuable to know, as it wasn't 12 alerts about 12 error log entries. It was an alert about an error log entry, and it was being repeated every hour. Much different story. So i dig deeper. To do this I need to drill into the alert itself and look at more details. Now I learn the following
The Database ID 6, Page (1:118112), slot 0 for LOB data type node does not exist.
The suggestion was to run dbcc CheckTable on the offending object. I need to know what database that is. Database ID 6. Is it one of my important databases? or a lesser important one? or a supplemental one. Which one is it... aaarrrggghhhh, I NEED TO KNOW!!! Since I had been out in the mountains, it seemed like my TSQL-fu was lacking and it did take me a minute to remember I could query sys.databases to see which one was Database ID 6. This was probably a few seconds, but in the panic of digging, it always seems longer. So, Database ID 6 turns out to be the database RedGateMonitor. Whew. Its only that db that is having an issue. A couple of quick DBCC CheckDB commands later I realize that all other DBs are in proper order. At least at the instance that I ran the CheckDB command. (hehe). So, I run on the RedGateMonitor and it encountered an object that has some issues. It is the object 'data.Cluster_SqlServer_SqlProcess_UnstableSamples'. I have no idea what this is, or if its needed, or what I should do with it. But, since its RedGateMonitor itself that has an issue, I figure I have some time.
Yeah for SQL Monitor for finding an issue on my DBs. Boo for SQLMonitor for its own object becoming funked up.
So, as you picture this, picture me on the top bunk of our camper, in my jammies (the desired PG rating refused to allow me to let you picture me in my undies), with a laptop on my lap, hooked into the RV campground WiFi, hunting down this issue. I am much more calm now that I know what db and what object is causing the issue. I quickly pen an email to my crew letting them know that I have discovered the issue, and a fix will come at some point, but until such time, we will continue to receive emails about every hour reminding us. Luckily its the weekend, and they can be ignored. My next step was to tweet my question about this object to #sqlmonitor and #kickasssupport. Knowing that someone will see it, and we'll get going on a solution soon. Maybe on Monday, maybe before.
Sure enough, I come in to work on Monday and see a tweet from @JowleyMonster with suggestions on how to remediate my issue. After a few attempts at recovering the data in the table via backups, I resorted to simply running a DBCC CheckDB with repair_allow_data_loss. I had to put the DB into SINGLE_USER mode first. Then run the DBCC CheckDB command. Then continue to verify via DBCC CheckTable on the particular object in question, followed by a larger DBCC CheckDB on the entire DB. When it all seemed OK, I returned the DB to MULTI_USER.
Now, you may be screaming at the screen now that I simply horked the connections to SQL Monitor, and did so in a graceful manner. I thought the same thing, and honestly wanted to try it out. I have been being so careful with it with other operations, I wanted to see what happened. In other instances I will actually stop monitoring everything, then have someone actually shut down the service, and then verify that all connections from SQLMonitor had terminated. But this time, I was curious.
It handled itself perfectly. Obviously the webpage that I had up throughout this ordeal was in a funked state. With a refresh, a few moments of uncertainty and stress on my part, it refreshed just fine. All was back to normal.
I am now free to pursue other tasks, as I leave the little gremlins that are SQLMonitor to do their job watching my DBs and letting me know when something is awry. It will happen. They will let me know. And I will dig in and fix them. I love having them around watching, especially when I am not.