What version of ONTAP?
Have a look at bug 780660:
mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=780660
What version of ONTAP?
Have a look at bug 780660:
mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=780660
Hello All-
We have been using Symantec NetBackup with our NetApp for doing NDMP backups to tape for quite some time. Recently, we migrated off of our old 7-mode filer to a new 8040 cluster running 8.3. I have been trying to get our tape backups running again and was able to set things up in NetBackup, but now whenever I try and run a full backup on any of the volumes, the backups only write at 150kb-200kb a second and the jobs eventually fail. I am using the same FC switch as before and updated the configuration on the switch and library to allow communication to the new FC adapters on the NetApp. I have 2 (although I recently disabled one of the adapters to see if that was causing the problem) FC adapters connected to node 1 on the NetApp to the FC switch, and I am using the Cluster management LIF as my NDMP host in NetBackup according to the following guide:
https://www.veritas.com/support/en_US/article.000025335.
Does anyone have any ideas? I've found some documentation from NetApp discussing how to troubleshoot poor backup performance, but I can't seem to set up a working "dump to null" command, which seems pretty integral to most of their troubleshooting steps. Still, I don't think the performance issue is caused by an overloaded controller.
Here's some of the output I am seeing from a given backup job in the NetBackup admin console:
10/19/2016 10:14:55 - Info nbjm (pid=6560) starting backup job (jobid=53570) for client CLUSTERMGMT, policy POLICY1, schedule SCHEDULE1
10/19/2016 10:14:55 - Info nbjm (pid=6560) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=53570, request id:{3A3732BC-A6DD-4FB4-8C35-2C57512B093A})
10/19/2016 10:14:55 - requesting resource backup_svm-SCHEDULE1
10/19/2016 10:14:55 - requesting resource NB-HOST.NBU_CLIENT.MAXJOBS.CLUSTERMGMT
10/19/2016 10:14:55 - requesting resource NB-HOST.NBU_POLICY.MAXJOBS.POLICY1
10/19/2016 10:14:55 - granted resource NB-HOST.NBU_CLIENT.MAXJOBS.CLUSTERMGMT
10/19/2016 10:14:55 - granted resource NB-HOST.NBU_POLICY.MAXJOBS.POLICY1
10/19/2016 10:14:55 - granted resource 101323
10/19/2016 10:14:55 - granted resource IBM.ULTRIUM-TD3.004
10/19/2016 10:14:55 - granted resource NB-HOST-hcart3-robot-tld-3-CLUSTERMGMT
10/19/2016 10:14:56 - estimated 0 kbytes needed
10/19/2016 10:14:56 - Info nbjm (pid=6560) started backup (backupid=CLUSTERMGMT_1476886495) job for client CLUSTERMGMT, policy POLICY1, schedule SCHEDULE1 on storage unit NB-HOST-hcart3-robot-tld-3-CLUSTERMGMT
10/19/2016 10:14:56 - started process bpbrm (pid=11208)
10/19/2016 10:14:57 - Info bpbrm (pid=11208) CLUSTERMGMT is the host to backup data from
10/19/2016 10:14:57 - Info bpbrm (pid=11208) reading file list for client
10/19/2016 10:14:57 - connecting
10/19/2016 10:14:57 - Info bpbrm (pid=11208) starting ndmpagent on client
10/19/2016 10:14:57 - Info ndmpagent (pid=12152) Backup started
10/19/2016 10:14:57 - Info ndmpagent (pid=12152) PATH(s) found in file list = 1
10/19/2016 10:14:57 - Info ndmpagent (pid=12152) PATH[1 of 1]: /backup_svm/volume_to_backup
10/19/2016 10:14:57 - Info bptm (pid=10756) start
10/19/2016 10:14:57 - Info bptm (pid=10756) using 30 data buffers
10/19/2016 10:14:57 - Info bptm (pid=10756) using 65536 data buffer size
10/19/2016 10:14:57 - connected; connect time: 0:00:00
10/19/2016 10:14:58 - Info bptm (pid=10756) start backup
10/19/2016 10:14:58 - Info bptm (pid=10756) Waiting for mount of media id 101323 (copy 1) on server NB-HOST.
10/19/2016 10:14:58 - mounting 101323
10/19/2016 10:14:59 - Info ndmpagent (pid=12152) CLUSTERMGMT: Session identifier: 28042
10/19/2016 10:15:44 - Info bptm (pid=10756) media id 101323 mounted on drive index 8, drivepath /NODE1/nrst3a, drivename IBM.ULTRIUM-TD3.004, copy 1
10/19/2016 10:15:44 - Info ndmpagent (pid=12152) CLUSTERMGMT: SCSI: TAPE READ: short read for nrst3a
10/19/2016 10:15:44 - mounted 101323; mount time: 0:00:46
10/19/2016 10:15:44 - positioning 101323 to file 2
10/19/2016 10:15:47 - Info ndmpagent (pid=12152) NDMP 3Way - Data Affinity 13102a5c-7740-11e5-8b3a-f34bfadd9084 is not equal to Tape Affinity d1c11ad3-7740-11e5-b678-5fd0506b00a8
10/19/2016 10:15:47 - positioned 101323; position time: 0:00:03
10/19/2016 10:15:47 - begin writing
10/19/2016 10:15:49 - Info ndmpagent (pid=12152) CLUSTERMGMT: Session identifier for Mover : 28042
10/19/2016 10:15:49 - Info ndmpagent (pid=12152) CLUSTERMGMT: Session identifier for Backup : 30232
10/19/2016 10:15:49 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Using "/backup_svm/volume_to_backup/../4hours.2016-10-19_0800" snapshot.
10/19/2016 10:15:49 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Using Full Volume Dump
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Using 4hours.2016-10-19_0800 snapshot
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Date of this level 0 dump snapshot: Wed Oct 19 08:00:00 2016.
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Date of last level 0 dump: the epoch.
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Dumping /backup_svm/volume_to_backup to NDMP connection
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: mapping (Pass I)[regular files]
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Reference time for next incremental dump is : Wed Feb 3 09:15:02 2016.
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: mapping (Pass II)[directories]
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: estimated 84603127 KB.
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: dumping (Pass III) [directories]
10/19/2016 10:19:18 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: dumping (Pass IV) [regular files]
10/19/2016 10:20:52 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Wed Oct 19 10:20:52 2016 : We have written 173211 KB.
...lines repeating as dump progresses
10/19/2016 16:33:13 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Wed Oct 19 16:33:13 2016 : We have written 11166693 KB.
10/19/2016 16:36:04 - Error nbjm (pid=6560) nbrb status: LTID reset media server resources
10/19/2016 16:36:14 - Error ndmpagent (pid=12152) terminated by parent process
10/19/2016 16:36:14 - Info ndmpagent (pid=0) done
10/19/2016 16:36:14 - Info ndmpagent (pid=12152) Received ABORT request from bptm
10/19/2016 16:36:14 - Error ndmpagent (pid=12152) NDMP backup failed, path = /backup_svm/volume_to_backup
10/19/2016 16:36:14 - Error ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Write to socket failed
10/19/2016 16:36:14 - Error ndmpagent (pid=12152) CLUSTERMGMT: DUMP: DUMP IS ABORTED
10/19/2016 16:36:14 - Warning ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Total Dir to FH time spent is greater than 15 percent of phase 3 total time. Please verify the settings of backup application and the network connectivity.
10/19/2016 16:36:14 - Error ndmpagent (pid=12152) CLUSTERMGMT: DATA: Operation terminated (for /backup_svm/volume_to_backup).
10/19/2016 16:36:15 - Error ndmpagent (pid=12152) CLUSTERMGMT: BACKUP: job aborted
10/19/2016 16:36:15 - Error ndmpagent (pid=12152) CLUSTERMGMT: BACKUP: BACKUP_NET IS ABORTED
10/19/2016 16:36:15 - Info ndmpagent (pid=12152) CLUSTERMGMT: MOVER: Tape writing operation terminated
10/19/2016 16:37:45 - Info ndmpagent (pid=0) done. status: 150: termination requested by administrator
10/19/2016 16:37:45 - end writing; write time: 6:21:58
client process aborted (50)
Nevermind, looks like I figured it out. The new cluster has 16/8/4 capable FC adapters and the library was using 4/2/1. The tape library was connected at 4g and the NetApp FC adapters were connecting at 8. I couldn't find a way to force the speed to 4g on the NetApp side (only had the options for FC adapters in target mode, these were in initiator so I could connect to library) but I was able to set the port speed on the switch to 4g. After doing that, NDMP jobs now write at ~78,119 kbps (~268mb per hour). The jobis able to run to completion now and I no longer receive any of the NDMP error messages.
OK, We've found some changed behaviour between 7-mode and C-mode
In 7-mode filers the following is implemented :
rpcmod:svc_idle_timeout Description
Controls the duration of time on the server that a connection between the client and server is allowed to remain idle before being closed.
Data Type
Long integer (32 bits on 32–bit platforms and 64 bits on 64–bit platforms)
Default
360,000 milliseconds (6 minutes)
This means that NFS sessions over TCP will close after 6 minutes if no traffic is seen. When this happens the session is gracefully shutdown by starting FIN handshake. The client will automatically start a new session once the NFS mount is used again.
In C-mode this doesn't happen! In C-mode there is a keepalive mechanism which will poll the client after 2 hours idle, and if it doesn't receive an answer it will remove the session from its session table.
How can we get the old behaviour back? We need it if we use firewalls between clients and netapp.
Thanks in advance!
Frank
Hi guys,
just curious, what is needed to change volumes used for VMware datastores with NFS3 to run with NFS 4.1?
Simple unmount on both ends, enable 4.1 support on NetApp and mount again?
Thanks!
Hello,
Follow problem with ONTAP 9 and FAS2552
cl1::vserver cifs> dns
cl1::vserver services name-service dns> show
Name
Vserver State Domains Servers
--------------- --------- ----------------------------------- ----------------
cl1 enabled gym-hksb.local 10.30.253.1,
10.30.253.3
nas enabled gym-hksb.local 10.30.253.1,
10.30.253.3
2 entries were displayed.
cl1::vserver services name-service dns> cifs
cl1::vserver cifs> create -cifs-server file02 -domain gym-hksb.local -ou CN=Computers
In order to create an Active Directory machine account for the CIFS server, you must supply the name and password of a Windows account with sufficient privileges to add computers to the
"CN=Computers" container within the "GYM-HKSB.LOCAL" domain.
Enter the user name: administrator
Enter the password:
Error: Machine account creation procedure failed
[ 1002] Loaded the preliminary configuration.
[ 1730] Created a machine account in the domain
[ 1732] Successfully connected to ip 10.30.253.1, port 445 using
TCP
[ 1833] Unable to connect to LSA service on dc01.gym-hksb.local
(Error: RESULT_ERROR_SPINCLIENT_SOCKET_RECEIVE_ERROR)
[ 1835] Successfully connected to ip 10.30.253.3, port 445 using
TCP
[ 1937] Unable to connect to LSA service on dc02.gym-hksb.local
(Error: RESULT_ERROR_SPINCLIENT_SOCKET_RECEIVE_ERROR)
[ 1937] No servers available for MS_LSA, vserver: 4, domain:
gym-hksb.local.
**[ 1937] FAILURE: Unable to make a connection
** (LSA:GYM-HKSB.LOCAL), result: 6940
[ 1937] Could not find Windows SID
'S-1-5-21-1131981276-2882716370-3949356162-512'
[ 1944] Deleted existing account
'CN=FILE02,CN=Computers,DC=gym-hksb,DC=local'
Error: command failed: Failed to create the Active Directory machine account "FILE02". Reason: SecD Error: no server available.
ping to Domain successful
time zone on Domain and Netapp correct
Any idea to solve this?
Thanks,
Jürgen
You need to mount a new datastore on NFSv4.1 and migrate the VMs using storage vmotion.
Hi
It might be the issue with the login account you are using. Does user account have admin privalages to active directory. You need admin privalages to add Netapp vserever to active directory domain.
Hi,
sure I use the Domain administrator Account.
KR
Please let me know the result of this ... :-)
Naveenkumar Pusuluru
Storage lead | C3i Healthcare connections
DC is reachable
DNS is configured
time zone is correct
cl1::vserver cifs> create -cifs-server file02 -domain gym-hksb.local -ou CN=Computers
In order to create an Active Directory machine account for the CIFS server, you must supply the name and password of a Windows account with sufficient privileges to add computers to the "CN=Computers"
container within the "GYM-HKSB.LOCAL" domain.
Enter the user name: administrator
Enter the password:
Error: Machine account creation procedure failed
[ 86] Loaded the preliminary configuration.
[ 121] Created a machine account in the domain
[ 122] Successfully connected to ip 10.30.253.1, port 445 using
TCP
[ 123] Unable to connect to LSA service on dc01.gym-hksb.local
(Error: RESULT_ERROR_SPINCLIENT_SOCKET_RECEIVE_ERROR)
[ 123] Successfully connected to ip 10.30.253.3, port 445 using
TCP
[ 124] Unable to connect to LSA service on dc02.gym-hksb.local
(Error: RESULT_ERROR_SPINCLIENT_SOCKET_RECEIVE_ERR
OR)
[ 124] No servers available for MS_LSA, vserver: 4, domain:
gym-hksb.local.
**[ 124] FAILURE: Unable to make a connection
** (LSA:GYM-HKSB.LOCAL), result: 6940
[ 124] Could not find Windows SID
'S-1-5-21-1131981276-2882716370-3949356162-512'
[ 131] Deleted existing account
'CN=FILE02,CN=Computers,DC=gym-hksb,DC=local'
Error: command failed: Failed to create the Active Directory machine account "FILE02". Reason: SecD Error: no server available.
cl1::vserver cifs> ping -node cl1-0
cl1-01 cl1-02
cl1::vserver cifs> ping -node cl1-01 -destination
Destination
cl1::vserver cifs> ping -node cl1-01 -destination GYM-HKSB.LOCAL
GYM-HKSB.LOCAL is alive
cl1::vserver cifs> dns show
Name
Vserver State Domains Servers
--------------- --------- ----------------------------------- ----------------
cl1 enabled gym-hksb.
local 10.30.253.1,
10.30.253.3
nas enabled gym-hksb.local 10.30.253.1,
10.30.253.3
2 entries were displayed.
cl1::vserver cifs> network interface show
Logical Status Network Current Current Is
Vserver Interface Admin/Oper Address/Mask Node Port Home
----------- ---------- ---------- ------------------ ------------- ------- ----
Cluster
cl1-01_clus1 up/up 169.254.141.0/16 cl1-01 e0e true
cl1-01_clus2 up/up 169.254.239.201/16 cl1-01 e0f true
cl1-02_clus1 up/up 169.254.175.70/16 cl1-02 e0e true
cl1-02_clus2 up/up 169.254.53.54/16 cl1-02 e0f true
cl1
cl1-01_mgmt1 up/up 10.30.253.51/16 cl1-01 e0M true
cl1-02_mgmt1 up/up 10.30.25
3.52/16 cl1-02 e0M true
cluster_mgmt up/up 10.30.253.50/16 cl1-01 e0M true
nas
nas_lif up/up 10.30.253.55/16 cl1-01 a0a true
8 entries were displayed.
cl1::vserver cifs> system date show
Node Date Time zone
--------- ------------------------- -------------------------
cl1-01 10/24/2016 18:20:11 Europe/Berlin
+02:00
cl1-02 10/24/2016 18:20:11 Europe/Berlin
+02:00
2 entries were displayed.
cl1::vserver cifs>
same time on AD
C:\Users\Administrator.GYM-HKSB>net time \\dc01
Aktuelle Zeit auf \\dc01 ist 24.10.2016 18:20:37.
event log show
cl1::vserver cifs> event log show -time >4m
Time Node Severity Event
------------------- ---------------- ------------- ---------------------------
10/24/2016 18:26:40 cl1-01 ERROR secd.conn.auth.failure: Vserver (nas) could not make a connection over the network to server (10.30.253.3) via interface 10.30.253.55. Error: Connection reset by peer.
10/24/2016 18:26:40 cl1-01 ERROR secd.conn.auth.failure: Vserver (nas) could not make a connection over the network to server (10.30.253.1) via interface 10.30.253.55. Error: Connection reset by peer.
10/24/2016 18:25:38 cl1-01 ERROR secd.dns.srv.lookup.failed: DNS server failed to look up service (_kerberos._tcp.10.30.253.1) for vserver (nas) with error (No server(s) found).
10/24/2016 18:25:37 cl1-01 ERROR secd.dns.srv.lookup.failed: DNS server failed to look up service (_ldap._tcp.dc._msdcs.10.30.253.1) for vserver (nas) with error (No server(s) found).
10/24/2016 18:25:37 cl1-01 ERROR secd.dns.srv.lookup.failed: DNS server failed to look up service (_ldap._tcp.10.30.253.1) for vserver (nas) with error (No server(s) found).
10/24/2016 18:25:35 cl1-01 ERROR secd.dns.srv.lookup.failed: DNS server failed to look up service (_ldap._tcp.Default-First-Site-Name._sites.10.30.253.1) for vserver (nas) with error (No server(s) found).
10/24/2016 18:25:35 cl1-01 ERROR secd.dns.srv.lookup.failed: DNS server failed to look up service (_kerberos._tcp.dc._msdcs.10.30.253.1) for vserver (nas) with error (No server(s) found).
cl1::vserver cifs> ping -lif nas_lif -vserver nas -destination
<Remote InetAddress> Destination
cl1::vserver cifs> ping -lif nas_lif -vserver nas -destination 10.30.253.1
10.30.253.1 is alive
cl1::vserver cifs> ping -lif nas_lif -vserver nas -destination 10.30.253.3
10.30.253.3 is alive
cl1::vserver cifs>
cl1::vserver cifs>
cl1::vserver cifs> dns show
Name
Vserver State Domains Servers
--------------- --------- ----------------------------------- ----------------
cl1 enabled gym-hksb.local 10.30.253.1,
10.30.253.3
nas enabled gym-hksb.local 10.30.253.1,
10.30.253.3
2 entries were displayed.
cl1::vserver cifs>
Hi,
Have you tried setting your timezone to closest city to you listed in the link below:
https://library.netapp.com/ecmdocs/ECMP1368852/html/GUID-48AD434D-433B-4208-8D9E-C3696707E20C.html
Before you can join the vserver to the domain you first need to set the date\time and timezone to ensure the systems time is within 5 minutes of your domain controller.
To check the time on your DC you can use the net time command:
C:\>net time \\testdc01
Current time at \\testdc01 is 23/07/2015 6:26:37 PM
The command completed successfully.
Then set the date on your cluster:
cluster1> system date modify -dateandtime 201507231826.48
cluster1> system date show
Node Date Time zone
--------- ------------------------- -------------------------
node1
7/23/2015 18:26:53 +10:00 Australia/Sydney
Then set your timezone
cluster1> timezone America/Vancouver
1 entry modified
cluster1> system date show
Node Date Time zone
--------- ------------------------- -------------------------
node1
7/23/2015 01:27:12 -07:00 America/Vancouver
Also it's worth mentioning that you will need to enter credentials of an Active Directory user account during the cifs setup process that has permissions in Active Directory to create the computer object and join the vserver to the domain.
The minimum required Active Directory permissions for computer objects in your organizational unit are:
http://support.microsoft.com/kb/932455
Create Computer Objects
Reset Password
Read and write Account Restrictions
Validated write to DNS host name
Validated write to service principal name
hope this helps
yes - timezone and date configured without any Issue.
Netapp can reach BOTH domain-controller (TCP ping) but cDOT event log complain no DC Server is reachable :-/
Thanks for pointing this out. This bug seems to nail it. I didn't have access to the bug database before. Also my IT department confirmed the bug. Unfortunately updating seems to be a more complex task.
Hi,
Hope this KB article helps https://kb.netapp.com/support/s/article/troubleshooting-workflow-specified-network-resource-or-device-is-no-longer-available
Thanks
Hi Artik,
Got any lucjK? I've got the same problem.
we found an authentication error with a packet trace.
the filer is reporting "KRB5KRB_AP_ERR_MODIFIED" and "STATUS_MORE_PROCESSING_REQUIRED" in the same packet.
Unresolved.
Hi how did you get Kerberos working? i have been crazy over this..
what was the solution?
Currently I habe a similiar Issue:
Error: Machine account creation procedure failed
[ 82] Loaded the preliminary configuration.
[ 111] Created a machine account in the domain
[ 112] Successfully connected to ip 10.30.253.1, port 445 using
TCP
[ 113] Unable to connect to LSA service on dc01.gym-hksb.local
(Error: RESULT_ERROR_SPINCLIENT_SOCKET_RECEIVE_ERROR)
[ 114] Successfully connected to ip 10.30.253.3, port 445 using
TCP
[ 114] Unable to connect to LSA service on dc02.gym-hksb.local
(Error: RESULT_ERROR_SPINCLIENT_SOCKET_RECEIVE_ERROR)
[ 115] No servers available for MS_LSA, vserver: 4, domain:
gym-hksb.local.
**[ 115] FAILURE: Unable to make a connection
** (LSA:GYM-HKSB.LOCAL), result: 6940
[ 115] Could not find Windows SID
'S-1-5-21-1131981276-2882716370-3949356162-512'
[ 119] Deleted existing account
'CN=FILE02,CN=Computers,DC=gym-hksb,DC=local'