<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" ><channel><title>WPKG BlogiSCSI &#187;</title> <atom:link href="http://blog.wpkg.org/category/iscsi/feed/" rel="self" type="application/rss+xml" /><link>http://blog.wpkg.org</link> <description>a technical IT blog</description> <lastBuildDate>Tue, 06 Dec 2011 13:25:58 +0000</lastBuildDate> <generator>http://wordpress.org/?v=2.9.2</generator> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <item><title>Solving reliability and scalability problems with iSCSI, part 2</title><link>http://blog.wpkg.org/2007/09/27/solving-reliability-and-scalability-problems-with-iscsi-part-2/</link> <comments>http://blog.wpkg.org/2007/09/27/solving-reliability-and-scalability-problems-with-iscsi-part-2/#comments</comments> <pubDate>Thu, 27 Sep 2007 21:32:05 +0000</pubDate> <dc:creator>admin</dc:creator> <category><![CDATA[All articles]]></category> <category><![CDATA[Linux]]></category> <category><![CDATA[iSCSI]]></category><guid isPermaLink="false">http://blog.wpkg.org/2007/09/27/solving-reliability-and-scalability-problems-with-iscsi-part-2/</guid> <description><![CDATA[See Solving reliability and scalability problems with iSCSI, part 1 article. The latest stable IET release, 0.4.15, suffers yet another misfeature: it will likely break all initiators when the ietd process is restarted (ietd restart, machine restart etc.). This is because on ietd shutdown, the user space daemon is just killed, but it  doesn&#8217;t have the [...]]]></description> <content:encoded><![CDATA[<p>See <a href="http://blog.wpkg.org/2007/09/09/solving-reliability-and-scalability-problems-with-iscsi">Solving reliability and scalability problems with iSCSI, part 1</a> article.</p><p>The latest stable IET release, 0.4.15, suffers yet another misfeature: it will likely break all initiators when the ietd process is restarted (ietd restart, machine restart etc.).</p><p>This is because on ietd shutdown, the user space daemon is just killed, but it  doesn&#8217;t have the corresponding signal handler and the kernel space  module doesn&#8217;t perform any cleanups for it.</p><p>From initiator&#8217;s perspective, several things may happen:</p><ul><li>the change will happen so fast that the initiator won&#8217;t notice anything</li><li>initiator will break the connection, but after reconnection, you will see your iSCSI drives remounted read-only, weird hangs etc.</li><li>initiator won&#8217;t be able to connect again</li></ul><p>Of course, no one wants to have the filesystem remounted read only, or to have the hard drive ripped off from a working system (this is how it looks from the perspective of the kernel if we loose a iSCSI connection).</p><p>There is a simple solution to that, although some may say it&#8217;s an ugly hack.</p><p><span id="more-10"></span></p><p>Using <code>iptables</code> to block the traffic to and from the IET target just before it shuts down does the job &#8211; the change is trivial &#8211; in <code>/etc/init.d/iscsi-target</code>, add something similar to the lines marked in <strong>bold</strong>:</p><p><code>ietd_start()<br /> {<br /> echo -n "Starting iSCSI enterprise target service: "<br /> configure_memsize<br /> modprobe -q crc32c<br /> modprobe iscsi_trgt<br /> start-stop-daemon --start --exec $DAEMON --quiet<br /> RETVAL=$?<br /> <strong>sleep 3s<br /> iptables -D OUTPUT -p tcp --dport 3260 -j DROP<br /> iptables -D INPUT -p tcp --dport 3260 -j DROP</strong></code></p><p>if [ $RETVAL == "0" ]; then<br /> echo &#8220;succeeded.&#8221;<br /> else<br /> echo &#8220;failed.&#8221;<br /> fi<br /> }</p><p>ietd_stop()<br /> {<br /> <strong>iptables -A OUTPUT -p tcp &#8211;dport 3260 -j DROP<br /> iptables -A INPUT -p tcp &#8211;dport 3260 -j DROP</strong><br /> echo -n &#8220;Removing iSCSI enterprise target devices: &#8221;<br /> # ugly, but ietadm does not allways provides correct exit values<br /> RETURN=`ietadm &#8211;op delete 2&gt;&amp;1`</p><p>With it, and with changes described in <a href="http://blog.wpkg.org/2007/09/09/solving-reliability-and-scalability-problems-with-iscsi">part 1</a>, your IET iSCSI target installation should be really bullet-proof.</p><p>Or, perhaps, instead of using these hacks over and over again, it&#8217;s time to switch to more mature <a href="http://scst.sourceforge.net/">iSCSI-SCST</a>?</p> ]]></content:encoded> <wfw:commentRss>http://blog.wpkg.org/2007/09/27/solving-reliability-and-scalability-problems-with-iscsi-part-2/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>Solving reliability and scalability problems with iSCSI</title><link>http://blog.wpkg.org/2007/09/09/solving-reliability-and-scalability-problems-with-iscsi/</link> <comments>http://blog.wpkg.org/2007/09/09/solving-reliability-and-scalability-problems-with-iscsi/#comments</comments> <pubDate>Sat, 08 Sep 2007 23:48:33 +0000</pubDate> <dc:creator>admin</dc:creator> <category><![CDATA[All articles]]></category> <category><![CDATA[iSCSI]]></category><guid isPermaLink="false">http://blog.wpkg.org/?p=3</guid> <description><![CDATA[ Because datacenters are very dependent on iSCSI, with an increasing amount of diskless servers booted directly off iSCSI NAS devices, a rock-solid operation of iSCSI is mandatory. The system should not fail even if the connection between iSCSI target and initiator is broken.See Solving reliability and scalability problems with iSCSI, part 2 article. IET, or [...]]]></description> <content:encoded><![CDATA[<p style="margin-left: 1.99cm; margin-bottom: 0cm; line-height: 150%"> <font size="2">Because datacenters are very dependent on iSCSI, with an increasing amount of diskless servers booted directly off iSCSI NAS devices, a rock-solid operation of iSCSI is mandatory. The system should not fail even if the connection between iSCSI target and initiator is broken.</font></p><p style="margin-bottom: 0cm; line-height: 150%"><span id="more-3"></span></p><p style="margin-bottom: 0cm; line-height: 150%">See <a href="http://blog.wpkg.org/2007/09/27/solving-reliability-and-scalability-problems-with-iscsi-part-2/">Solving reliability and scalability problems with iSCSI, part 2</a> article.</p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">IET, or iSCSI Enterprise Target, is an “open source iSCSI target with professional features, that works well in enterprise environment under real workload, and is scalable and versatile enough to meet the challenge of future storage needs and developments”<a href="#sdfootnote1sym" title="sdfootnote1anc" class="sdfootnoteanc" name="sdfootnote1anc"><sup>1</sup></a>.</font></p><p style="margin-bottom: 0cm; line-height: 100%">&nbsp;</p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">By default, Open-iSCSI initiators<a href="#sdfootnote2sym" title="sdfootnote2anc" class="sdfootnoteanc" name="sdfootnote2anc"><sup>2</sup></a> are able to deal only with very brief disconnections.</font></p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">They determine connectivity by sending iSCSI NOP-outs as pings every 10 seconds. If no response is received in 120 seconds, the connection is considered failed, and an error is returned to the SCSI layer. As a result, the SCSI layer offlines the device. We can compare it to a regular server with its hard disk removed during normal operation. In this case the in-memory data is not written to the disk, filesystems are not cleanly unmounted, while the server keeps running in a failing state.</font></p><p style="margin-bottom: 0cm; line-height: 100%">&nbsp;</p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">120 seconds is definitely too short. Replacing cabling, failing switch, upgrading a SAN or simply human errors that interrupts the connection for longer than 2 minutes should not bring the datacenter to an unpleasant halt.</font></p><p style="margin-bottom: 0cm; line-height: 100%">&nbsp;</p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">After a testing period, I determined that the initiator, when set up properly, can handle disconnections lasting several minutes, hours, or even days. Where previously, failing hardware or operator mistake would cause corrupted filesystems, perhaps damaged data, and a need to restart several servers manually, now, the iSCSI initiator will handle such situations graciously. Processes will just be in an uninterruptible sleep state, waiting for I/O operations to complete. Once the connection is re-established, processes would continue to work correctly.</font></p><p style="margin-bottom: 0cm; line-height: 100%">&nbsp;</p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">However, moving that knowledge into production was harder than expected, and extremely hard to debug.</font></p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">Even a short, few-seconds disconnection was causing a lot of trouble – devices offlined immediately, corrupted filesystems, and painful server restarts. Worse – there was no 100% way to reproduce it reliably – <em>tcpdump</em>, running IET daemon process with changed options, on different architectures, or in a debugger did not give any obvious hints.</font></p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">In the end, after a lot of testing, the diagnose was clear:</font></p><ul><li><font size="2">it&#8217;s not the initiator&#8217;s problem</font></li><li><font size="2">it does not depend on architecture (x86, x86_64, ARM)</font></li><li><font size="2">there need to be hundreds of connected initiators to reproduce it</font></li><li><font size="2">all these hundreds of initiators need to be connected, disconnected (because of the SAN restart, IET daemon process restart, cabling etc.) and then, connected again</font></li></ul><p style="margin-bottom: 0cm; line-height: 150%">&nbsp;</p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">Why does this phenomenon occur? </font></p><ol><li><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">We 	can call it a DDoS on the IET daemon:</font></p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">when 	hundreds of initiators are disconnected, they try to send a iSCSI 	NOP-out pings every 10 seconds to the target, and finally, determine 	that the connection is broken. When the connection is available 	again, all these hundreds of initiators try to re-establish the 	connection at the same time.</font></p></li><li><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">A 	mathematical approach:</font></p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">with 	~200 initiators, and NOP-out pings sent every 10 seconds, on 	average, it makes ~20 of them try to reconnect every second. It is 	an unrealistic scenario that everything is distributed evenly. In 	fact, <em>tcpdump</em> tests show 50-100 initiators trying to 	reconnect to the target at almost the same time.</font></p></li></ol><p style="margin-bottom: 0cm; line-height: 150%">&nbsp;</p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">The following patch solved the problem:</font></p><p style="margin-bottom: 0cm; line-height: 150%"><font face="Courier, monospace"><font size="2">&#8212; ietd.c.orig 2006-12-18 06:54:01.000000000 +0100</font></font></p><p style="margin-bottom: 0cm; line-height: 150%"><font face="Courier, monospace"><font size="2">+++ ietd.c      2007-06-17 18:20:49.000000000 +0200</font></font></p><p style="margin-bottom: 0cm; line-height: 150%"><font face="Courier, monospace"><font size="2">@@ -29,7 +29,7 @@</font></font></p><p style="margin-bottom: 0cm; line-height: 150%"> <font face="Courier, monospace"><font size="2">#include &#8220;ietadm.h&#8221;</font></font></p><p style="margin-bottom: 0cm; line-height: 150%">&nbsp;</p><p style="margin-bottom: 0cm; line-height: 150%"> <font face="Courier, monospace"><font size="2">#define LISTEN_MAX             8</font></font></p><p style="margin-bottom: 0cm; line-height: 150%"><font face="Courier, monospace"><font size="2">-#define INCOMING_MAX           32</font></font></p><p style="margin-bottom: 0cm; line-height: 150%"><font face="Courier, monospace"><font size="2">+#define INCOMING_MAX           256</font></font></p><p style="margin-bottom: 0cm; line-height: 150%">&nbsp;</p><p style="margin-bottom: 0cm; line-height: 150%"> <font face="Courier, monospace"><font size="2">enum {</font></font></p><p style="margin-bottom: 0cm; line-height: 200%"> <font face="Courier, monospace"><font size="2">POLL_LISTEN,</font></font></p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">In <em>ietd.c</em>, the size of <font face="Courier, monospace"><span style="font-style: normal">incoming[]</span></font><em> </em>array is decided by hardcoded, compile-time <font face="Courier, monospace">INCOMING_MAX</font>. This is the maximum number of incoming connections <em>ietd</em> can process in a login session. When the iSCSI connection is fully established and reaches its full feature phase, it is passed to the kernel and this resource is available again. So it is only needed to increase this number for “connection storm” cases, when suddenly every initiator tries to connect.</font></p><p style="margin-bottom: 0cm; line-height: 150%">&nbsp;</p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2">Lesson learnt: always abuse your test system in every way you can imagine.</font></p><p id="sdfootnote1">&nbsp;</p><p id="sdfootnote2">&nbsp;</p><p style="margin-bottom: 0cm; line-height: 150%"><font size="2"><a href="#sdfootnote1anc" title="sdfootnote1sym" class="sdfootnotesym" name="sdfootnote1sym">1</a> http://iscsitarget.sf.net<br /> <a href="#sdfootnote2anc" title="sdfootnote2sym" class="sdfootnotesym" name="sdfootnote2sym">2</a> http://www.open-iscsi.org</font></p> ]]></content:encoded> <wfw:commentRss>http://blog.wpkg.org/2007/09/09/solving-reliability-and-scalability-problems-with-iscsi/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced
Database Caching using disk: basic
Object Caching 114/213 objects using disk: basic

Served from: blog.wpkg.org @ 2012-02-09 06:48:08 -->
