2

Snippet of the script that I am executing :

 $reader = $managementgroupobj.GetMonitoringPerformanceDataReader() 
 while ($reader.Read())    // << Error in this line.
 { 
      $perfData = $reader.GetMonitoringPerformanceData() 
      $valueReader = $perfData.GetValueReader($starttime,$endtime) 
      while ($valueReader.Read()) 
      { 
           $perfValue = $valueReader.GetMonitoringPerformanceDataValue()
      } 
 }

Here, $managementgroupobj is an instance of class ManagementGroup.

The difference of $starttime and $endtime veries from 15 minutes to 1 hour depending on the last execution of the same script.

The snippet collects the performance the data successfully for long time. but then, out of nowhere it throws following error:

"The requested reader was not valid. The reader either does not exist or has expired"

[ log_level=WARN pid=2716 ] Execute command 'get-scomallperfdata' failed. The requested reader was not valid. The reader either does not exist or has expired.
at GetSCOMPerformanceData, E:\perf\scom_command_loader.ps1: line 628
at run, E:\perf\scom_command_loader.ps1: line 591
at <ScriptBlock>, E:\perf\scom_command_loader.ps1: line 815
at <ScriptBlock>, <No file>: line 1
at <ScriptBlock>, <No file>: line 46
   at Microsoft.EnterpriseManagement.Common.Internal.ServiceProxy.HandleFault(String methodName, Message message)
   at Microsoft.EnterpriseManagement.Common.Internal.EntityObjectsServiceProxy.GetObjectsFromReader(Guid readerId, Int32 count)
   at Microsoft.EnterpriseManagement.Common.DataReader.Read()
   at CallSite.Target(Closure , CallSite , Object )
  • What is the cause of the mentioned error.?
  • It would be great if I get to know the mechanism of the PerformanceDataReader.

Note:

  • The amount of data it fetched before getting error was 100k+. and it took almost an hour to fetch that data.
  • I think the possible issue was with amount of data it has to fetch, It might be a kind of TimoutException.
  • It would be great if I get atleast some knowledge of both questioned mention above.

Thanks.

Jay Joshi
  • 1,402
  • 1
  • 13
  • 32
  • Tell us: (1) The exact exception being thrown, not the description. (2) What line is throwing the exception.(3) With regard to the "Read()" question, which call to ```Read()``` are you talking about? There are two. – Adam May 07 '18 at 00:21
  • Why are you trying to read ALL performance data of ALL monitored objects? I would rather suggest to get performance reader for a specific object, say a server, or logical disk, or a network port for instance. – Max May 07 '18 at 04:19
  • @Adam, I have mentioned more details that you asked in question itself. Check the snippet and the log. Also, I understood the significance of the reader object. Thanks. – Jay Joshi May 07 '18 at 10:42
  • @Max, I have come up with an idea which might solve the issue, but I am not sure about it. It is like this:: I should fetch the list of objects in the management group first. and then for each object, I should create reader which will fetch the data of that object only. WITH this approach, The issue of fetching a large amount of data from only one reader might be solved. Do this approach look ok.? – Jay Joshi May 07 '18 at 10:47
  • @Jay, yes, if you read only performance data related to a particular source class instance, it will significantly reduce number of associated counters and data returned. But remember, it's recursive -- i.e. if you request reader for a computer object, you'll be given data for all hosted objects, i.e. disks. However, I had to ask first: What are you trying to achieve? Also, have you considered direct SQL query? – Max May 07 '18 at 21:14
  • @Max, Thanks for the enlightenment. However, I ran the script with both approaches and both give the same amount of results. I am using Get-ScomClassInstance to get the objects of the management group. – Jay Joshi May 08 '18 at 11:40
  • @Max, "What are you trying to achieve?" : I am trying to redirect the performance data to Splunk (a monitoring tool) , "have you considered direct SQL query?" : No, I don't have enough knowledge and understanding about SQL query in SCOM. Would be great if you could share some knowledge on how to collect performance data as answer. – Jay Joshi May 08 '18 at 12:56
  • Check : https://stackoverflow.com/questions/50324414/get-average-performance-data-from-scom-using-powershell – Jay Joshi May 14 '18 at 06:52

2 Answers2

2

Since the end goal is offload ALL performance data to another tool, SCOM API will not provide enough performance thus direct SQL query are recommended.

A bit of background:

  1. SCOM has tow DBs. Operational holds all current status, including almost "real time" performance data. Data Warehouse DB holds historical data, including aggregated (hourly and daily) performance data. All the queries below are for Operational DB.
  2. SCOM as a platform can monitor absolutely anything -- it's implemented in Management Packs, so each MP can introduce new classes (types) of monitored entities, and/or new performance counters for existing classes. Say, you can create an MP for SAN appliance and start collecting its perf data. Or you can create another MP, which will add "Number of Files" counter to "Windows Logical Disk" class.

Keeping the above bits in mind, the below queries are for "Windows Computer" class (so won't work if you monitor Unix servers, you'll need to change class) and all associated objects.

Step 1: Find all available counters for a Windows Computer by its name.

NB!: results may be different depending OS version and MPs installed in your SCOM.

declare @ServerName as nvarchar(200) = 'server1.domain.local'

select pc.*
  from PerformanceCounterView pc
  join TypedManagedEntity tme on tme.TypedManagedEntityId = pc.ManagedEntityId
  join BaseManagedEntity bme on tme.BaseManagedEntityId = bme.BaseManagedEntityId
  where (bme.TopLevelHostEntityId = (select BaseManagedEntityId from BaseManagedEntity where FullName = 'Microsoft.Windows.Computer:'+@ServerName))
order by ObjectName, CounterName, InstanceName

Step 2: Retry actual performance data for each counter found in the step 1.

@SrcId parameter is PerformanceSourceInternalId column from the previous query.

NB!: all timestamps in SCOM are in UTC. The query below accept input in local time and produce output in local time as well.

declare @SrcID as int = XXXX
declare @End as datetime =  GETDATE()
declare @Start as datetime = DATEADD(HOUR, -4, @End)

declare @TZOffset as int = DATEDIFF(MINUTE,GETUTCDATE(),GETDATE())

SELECT SampleValue, DATEADD(MINUTE, @TZOffset, TimeSampled) as TS
  FROM PerformanceDataAllView
  where (PerformanceSourceInternalId = @SrcID)
        and (TimeSampled > DATEADD(MINUTE, -@TZOffset, @Start))
        and (TimeSampled < DATEADD(MINUTE, -@TZOffset, @End))

By default SCOM keeps only last 7 days of "real time" performance, then it gets aggregated and offloaded to Data Warehouse.

Don't call these queries too often or use "NO LOCK" statement to avoid blocking SCOM itself.

Hope that helps.

Cheers Max

Max
  • 751
  • 6
  • 10
  • Thank you so much @Max for the detailed answer.!! I so wish that I could tick this as the answer.!! I am so sorry that I cant bcoz it is not the exact answer of the question I asked in Post :( , but it is a great help, knowledge and a new approach of whole code..! – Jay Joshi May 10 '18 at 06:20
  • No worries, @Jay. Thank you for the great feedback. – Max May 10 '18 at 21:33
0

The reader call will return true if the reader moved to the next result and false if not; according to the method's documentation. If you are getting an exception, it couldn't do either of those. I'd assume something broke the connection between you and the SCCM instance.

If it's a timeout issue, I'm not sure it's an SCCM timeout. The error doesn't say anything about a timeout. As far as I know, this is an RPC call under the hood, and RPC doesn't have a timeout:

There are two ways your client can hang: network connectivity can cause server requests to become lost, or the server itself can crash. With default options, RPC will never time out a call, and your client thread will wait forever for a response.

Maybe a firewall is closing your connection after a certain period of time?

If you want to dial-in your performance, consider caching. It looks like you've got a much larger script that the snippet we see. Tossing this out just to make you aware its an option.

Adam
  • 3,891
  • 3
  • 19
  • 42