Monday, November 26, 2012

3par V-class arrays, code 37 reset and data corruption

Last week one of the nodes in our 3par v800 reset with a Code 37, and a few seconds before the node reset Oracle started to complain about corrupt blocks. In digging into this issue, it seems that there is a known hardware problem on the V-Class arrays. The issue stems from the PCI-E interface chipset on the system board and fibre channel cards.

We were told that we were the only customer to see data corruption with a Code 37 reset, but your mileage may vary. If you've had similar problems, I'd love to hear about it.

The following output from showeeprom shows a bad and good board:

Node: 5
--------
      Board revision: 0920-200009.A3
            Assembly: SAN 2012/03 Serial 3978
       System serial: 1405629
        BIOS version: 2.9.8
          OS version: 3.1.1.342
        Reset reason: PCI_RESET
           Last boot: 2012-11-17 20:51:43 EST
   Last cluster join: 2012-11-17 20:52:25 EST
          Last panic: 2012-03-23 08:56:46 EDT
  Last panic request: Never
   Error ignore code: 00
         SMI context: 00
       Last HBA mode: 2a100700
          BIOS state: 80 ff 24 27 28 29 2a 2c
           TPD state: 34 40 ff 2a 2c 2e 30 32
Code 27 (Temp/Voltage Failure) - Subcode 0x3 (1)        2012-11-17 20:47:34 EST
Code 31 (GPIO Failure) - Subcode 0x3 (1)                2012-11-17 20:43:45 EST
Code 37 (GEvent Triggered) - Subcode 0x80002001 (0)     2012-11-17 20:41:43 EST
Code 27 (Temp/Voltage Failure) - Subcode 0x3 (1)        2012-04-05 15:33:01 EDT
Code 27 (Temp/Voltage Failure) - Subcode 0x3 (1)        2012-03-26 17:59:21 EDT
Code 38 (Power Supply Failure) - Subcode 0x13 (0)       2012-03-26 17:06:41 EDT

I'm told that boards with revision D2 contain the fixes for the issue:

Node: 0
--------
      Board revision: 0920-200009.D2
            Assembly: SAN 2012/38 Serial 6349
       System serial: 1405629
        BIOS version: 2.9.8
          OS version: 3.1.1.342
        Reset reason: ALIVE_L
           Last boot: 2012-11-23 16:44:12 EST
   Last cluster join: 2012-11-23 16:44:47 EST
          Last panic: 2012-10-23 21:30:25 EDT
  Last panic request: Never
   Error ignore code: 00
         SMI context: 00
       Last HBA mode: 2a100700
          BIOS state: 80 ff 24 27 28 29 2a 2c
           TPD state: 34 40 ff 2a 2c 2e 30 32

Linux multipath (device-mapper) optimizations with 3par storage

Earlier this year we bought a 6-node 3par v800 array and it was being deployed with Oracle RAC Clusters running OEL 6.x.

We discovered that 3par's default configuration for multipath.conf would yield a 30 second I/O stall whenever we failed one of the paths.

Eventually  we were able to get support to offer adding "dev_loss_tmo            1" to the multipath.conf as such:

 defaults {  
  user_friendly_names yes  
  polling_interval    5  
  dev_loss_tmo      1  
 }  

With that in place, we would only observe a 1 second I/O stall during a path failure.

Wednesday, November 2, 2011

Monitoring disk space usage on Sun Fishworks (7000) ZFS Storage Appliances

We needed a way to have our existing monitoring system alert us if a project was running out of space. There's not a single CLI command that will show all projects, but this bit of ECMAscript 3will output an easily parsed table:


 script  
 //  
 // jwasilko@gmail.com  
 // fishworks' cli user interface doesn't provide a good way to monitor  
 // disk space of all projects. This is an attempt to make up for that.   
 //  
 run('shares');  
 projects = list();  
 printf('%-40s %-10s %-10s %-10s\n', 'SHARE', 'AVAIL', 'USED', 'SNAPUSED');  
 for (i = 0; i < projects.length; i++) {  
     run('select ' + projects[i]);  
     shares = list();  
         for (j = 0; j < shares.length; j++) {  
         run('select ' + shares[j]);  
         share = projects[i] + '/' + shares[j];  
         used = run('get space_data').split(/\s+/)[3];  
         avail = run('get space_available').split(/\s+/)[3];  
         snap = run('get space_snapshots').split(/\s+/)[3];  
         printf('%-40s %-10s %-10s %-10s\n', share, avail, used, snap);  
         run('cd ..');  
     }  
     run('cd ..');  
 }  

Tuesday, March 22, 2011

Celerra datamover group file doc bug

We're testing NFSv4 which requires a user/group database (either local files or LDAP/NIS) on the datamover.

Username/UID mapping was working properly, but group/GID mapping was not.

The Celerra Naming Services (6.0) doc on page 21 lists the format of the group file as:

groupname:gid:user_list

But the proper format includes a field for the group password:

groupname:password:gid:user_list

The password field is often blank (x).

Hope this helps someone else avoid the hassle we ran into.

Monday, March 7, 2011

Celerra top talkers & suspicious ops defined

The EMC Celerra datamovers have the ability to log statistics about top talkers, which can be useful for tracking down problems. We run server_stats with these options to get top talker stats:

/nas/bin/server_stats server_2 -top nfs -i 5 -c 60

One thing worth noting is there's a column labeled "NFS Suspicious Ops". There's no documentation on this column, and it took EMC some time to dig up the answer. Here it is:

SUSPICIOUS EVENTS:
One of the TopTalker output columns lists Suspicious Ops/second.
"Suspicious" events are any of the following, which are typical of the patterns seen when viruses or other badly behaved software/users are attacking a system:

CIFS events:
  • ACCESS_DENIED returned for FindFirst
  • ACCESS_DENIED returned for Open/CreateFile
  • ACCESS_DENIED returned for DeleteFile
  • SUCCESS returned for DeleteFile
  • SUCCESS returned for TruncateFile (size=0)

NFSv2/v3/v4 events:
  • NFSERR_ACCES returned for NFS OPEN/LOOKUP/CREATE/DELETE
  • NFSERR_ACCES returned for READDIR/READDIRPLUS
  • NFS_OK for NFS REMOVE
  • NFS_OK for NFS SETATTR (size=0)

Saturday, January 1, 2011

Monitoring share-based replication on Sun Fishworks (7000) Appliances

We use Sun/Oracle's Fishworks (7000) ZFS Storage Appliances to store our Oracle archive logs and to replicate them to our DR datacenter.

We generate more than 2TB of archive logs per day, and ZFS' compression helps knock that down to a somewhat more manageable 500GB a day. Initially we were using project-based replication which was easy to configure, but unfortunately there was not enough parallelism to keep up with our change rate.

Sun suggested setting up replications for each share (we have 16 shares per database cluster) to improve throughput. It's worked well, but the user interface didn't provide an overview of replication status.

Fortunately, the CLI can be scripted using JavaScript, so it was easy to loop over the projects and shares and extract the replication status.

To run the script, just ssh to the appliance and redirect stdin from the script:

 ldap1{jwasilko}64: ssh sun7310-1 < replication_status  
 Pseudo-terminal will not be allocated because stdin is not a terminal.  
 Password:   
 Current time: Sun Jan 02 2011 02:35:06 GMT+0000 (UTC)  
 Share             LastSync                 LastTry                 NextTry                   
 db/archivelogs_rman10     Sun Jan 02 2011 02:25:13 GMT+0000 (UTC) Sun Jan 02 2011 02:25:13 GMT+0000 (UTC) Sun Jan 02 2011 02:55:00 GMT+0000 (UTC)  
 db/archivelogs_rman12     Sun Jan 02 2011 02:26:13 GMT+0000 (UTC) Sun Jan 02 2011 02:26:13 GMT+0000 (UTC) Sun Jan 02 2011 02:56:00 GMT+0000 (UTC)  
 db/archivelogs_rman14     Sun Jan 02 2011 02:27:13 GMT+0000 (UTC) Sun Jan 02 2011 02:27:13 GMT+0000 (UTC) Sun Jan 02 2011 02:57:00 GMT+0000 (UTC)  
 db/archivelogs_rman16     Sun Jan 02 2011 02:28:13 GMT+0000 (UTC) Sun Jan 02 2011 02:28:13 GMT+0000 (UTC) Sun Jan 02 2011 02:58:00 GMT+0000 (UTC)  
 db/archivelogs_rman2      Sun Jan 02 2011 02:21:21 GMT+0000 (UTC) Sun Jan 02 2011 02:21:21 GMT+0000 (UTC) Sun Jan 02 2011 02:51:00 GMT+0000 (UTC)  
 db/archivelogs_rman4      Sun Jan 02 2011 02:22:13 GMT+0000 (UTC) Sun Jan 02 2011 02:22:13 GMT+0000 (UTC) Sun Jan 02 2011 02:52:00 GMT+0000 (UTC)  
 db/archivelogs_rman6      Sun Jan 02 2011 02:33:18 GMT+0000 (UTC) Sun Jan 02 2011 02:33:18 GMT+0000 (UTC) Sun Jan 02 2011 03:03:00 GMT+0000 (UTC)  
 db/archivelogs_rman8      Sun Jan 02 2011 02:24:13 GMT+0000 (UTC) Sun Jan 02 2011 02:24:13 GMT+0000 (UTC) Sun Jan 02 2011 02:54:00 GMT+0000 (UTC)  
   
   

The script is below. I hope it might be useful for you.

 script  
   
 //  
 // jwasilko@gmail.com  
 // fishworks' user interface doesn't provide a good way to monitor  
 // the health of share-based replication. this is an attempt to make  
 // up for that.  
 //  
   
   
 print("Current time: " + new Date());  
 printf('%-30s %-40s %-40s %-40s\n', "Share", "LastSync", "LastTry", "NextTry");  
   
   
 // Get the list of projects, to iterate over later  
 run('shares');  
 projects = list();  
   
 // for each project, list the shares  
 for (projectNum = 0; projectNum < projects.length; projectNum++) {  
  run('select ' + projects[projectNum]);  
  shares = list();  
   
  // Walk into the share and select replication, then actions  
  for (sharesNum = 0; sharesNum < shares.length; sharesNum++) {  
   try { run('select ' + shares[sharesNum]) } catch (err) { dump(err); }  
   share = projects[projectNum] + '/' + shares[sharesNum];  
   run('replication');  
   actions = list();  
   
   // Some shares may not have share-specific replication actions,  
   // so skip if needed. Otherwise, get the replication status  
   if ( actions.length > 0 ) {  
    for (actionsNum = 0; actionsNum < actions.length; actionsNum++) {  
     try { run('select ' + actions[actionsNum]) } catch (err) { dump(err); }  
     lastsync = run('get last_sync').split(/=/)[1];  
     lastsync = lastsync.replace(/\n/,"");  
     lasttry = run('get last_try').split(/=/)[1];  
     lasttry = lasttry.replace(/\n/,"");  
     nextupdate = run('get next_update').split(/=/)[1];  
     nextupdate = nextupdate.replace(/\n/,"");  
     printf('%-30s %-40s %-40s %-40s\n', share, lastsync, lasttry, nextupdate);  
    }  
   run('cd ../..');  
   }  
   else {  
    run('cd ..');  
   }  
   run('cd ..');  
   }  
 run('cd ..');  
 }