List of Rune - text.Baldanders.info
tag:text.Baldanders.info,2015-09-19:/tags
2015-09-19T23:45:56+09:00
帰ってきた「しっぽのさきっちょ」
https://text.baldanders.info/images/avatar.jpg
https://text.baldanders.info/images/avatar.jpg
String と Rune
tag:text.Baldanders.info,2015-09-19:/golang/string-and-rune/
2015-09-19T14:45:56+00:00
2021-08-12T21:22:05+00:00
今回は文字列について。短めにさくっと。
Spiegel
https://baldanders.info/profile/
<p>(初出: <a href="http://qiita.com/spiegel-im-spiegel/items/556166b6631c0369754f">はじめての Go 言語 (on Windows) その4 - Qiita</a>)</p>
<p>文字列を示す <a href="http://golang.org/ref/spec#String_types">string</a> は不変(immutable)なオブジェクトだが,中身は byte 配列である。
したがって以下のように</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">package</span> <span class="nx">main</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="s">"fmt"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">nihongo</span> <span class="o">:=</span> <span class="s">"日本語"</span>
</span></span><span class="line"><span class="cl"> <span class="nx">size</span> <span class="o">:=</span> <span class="nb">len</span><span class="p">(</span><span class="nx">nihongo</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">"nihongo = %d bytes :"</span><span class="p">,</span> <span class="nx">size</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="p"><</span> <span class="nx">size</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">" %02x"</span><span class="p">,</span> <span class="nx">nihongo</span><span class="p">[</span><span class="nx">i</span><span class="p">])</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p><a href="http://golang.org/ref/spec#String_types">string</a> をダンプすると以下の<a href="https://play.golang.org/p/acsS35_ZtHl">結果</a>になる<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>。</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">nihongo = 9 bytes : e6 97 a5 e6 9c ac e8 aa 9e
</span></span></code></pre></div><p>このように(<a href="http://golang.org/ref/spec#String_types">string</a> なんて名前なのに)文字単位で情報を保持しているわけではないため,最初の2文字を取り出すつもりでうっかり</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">package</span> <span class="nx">main</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="s">"fmt"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">nihongo</span> <span class="o">:=</span> <span class="s">"日本語"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">"nihongo = %s\n"</span><span class="p">,</span> <span class="nx">nihongo</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">"nippon = %s\n"</span><span class="p">,</span> <span class="nx">nihongo</span><span class="p">[:</span><span class="mi">2</span><span class="p">])</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>なんてなコードを書くと以下の<a href="https://play.golang.org/p/s6FEJg9bCh_D">結果</a>になる。</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">nihongo = 日本語
</span></span><span class="line"><span class="cl">nippon = ��
</span></span></code></pre></div><p>文字列を文字単位で扱うには <a href="http://blog.golang.org/strings" title="Strings, bytes, runes and characters in Go - The Go Blog">rune</a> 型を使う<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup>。
いや,ルーンってどんだけ厨二… ゲフンゲフン。</p>
<p>ええつと,たとえばこんな感じ。</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">package</span> <span class="nx">main</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="s">"fmt"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">nihongo</span> <span class="o">:=</span> <span class="s">"日本語"</span>
</span></span><span class="line"><span class="cl"> <span class="nx">nihongoRune</span> <span class="o">:=</span> <span class="p">[]</span><span class="nb">rune</span><span class="p">(</span><span class="nx">nihongo</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="nx">size</span> <span class="o">:=</span> <span class="nb">len</span><span class="p">(</span><span class="nx">nihongoRune</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">"nihongo = %d characters : "</span><span class="p">,</span> <span class="nx">size</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="p"><</span> <span class="nx">size</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">"%#U "</span><span class="p">,</span> <span class="nx">nihongoRune</span><span class="p">[</span><span class="nx">i</span><span class="p">])</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">"\n"</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p><a href="https://play.golang.org/p/gJqGozG4rOs">実行結果</a>はこんな感じになる。</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">nihongo = 3 characters : U+65E5 '日' U+672C '本' U+8A9E '語'
</span></span></code></pre></div><p>または, <a href="http://golang.org/ref/spec#String_types">string</a> に対して <a href="http://golang.org/ref/spec#For_statements">for range 構文</a>を使ってループを回すと <a href="http://blog.golang.org/strings" title="Strings, bytes, runes and characters in Go - The Go Blog">rune</a> 単位で取得できる。</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">package</span> <span class="nx">main</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="s">"fmt"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">nihongo</span> <span class="o">:=</span> <span class="s">"日本語"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="nx">pos</span><span class="p">,</span> <span class="nx">runeValue</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">nihongo</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">"%#U starts at byte position %d\n"</span><span class="p">,</span> <span class="nx">runeValue</span><span class="p">,</span> <span class="nx">pos</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>このコードの<a href="https://play.golang.org/p/tEtE-FEmoRx">実行結果</a>はこんな感じ。</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">U+65E5 '日' starts at byte position 0
</span></span><span class="line"><span class="cl">U+672C '本' starts at byte position 3
</span></span><span class="line"><span class="cl">U+8A9E '語' starts at byte position 6
</span></span></code></pre></div><p><a href="http://blog.golang.org/strings" title="Strings, bytes, runes and characters in Go - The Go Blog">rune</a> 型の実体は int32 で,内部表現は Unicode の符号点(code point)になっている。
<a href="http://golang.org/ref/spec#String_types">string</a> と <a href="http://blog.golang.org/strings" title="Strings, bytes, runes and characters in Go - The Go Blog">rune</a> 配列は相互変換できるので,文字列を切り取る場合は</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">package</span> <span class="nx">main</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="s">"fmt"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">nihongo</span> <span class="o">:=</span> <span class="s">"日本語"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">"nihongo = %s\n"</span><span class="p">,</span> <span class="nx">nihongo</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">"nippon = %s\n"</span><span class="p">,</span> <span class="nb">string</span><span class="p">([]</span><span class="nb">rune</span><span class="p">(</span><span class="nx">nihongo</span><span class="p">)[:</span><span class="mi">2</span><span class="p">]))</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>のように <a href="http://golang.org/ref/spec#String_types"><code>string</code></a> → <code>[]</code><a href="http://blog.golang.org/strings" title="Strings, bytes, runes and characters in Go - The Go Blog"><code>rune</code></a> → <a href="http://golang.org/ref/spec#String_types"><code>string</code></a> と変換していけば安全に処理できる。
上のコードの<a href="https://play.golang.org/p/EQEUIgkriHr">実行結果</a>はこんな感じ。</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">nihongo = 日本語
</span></span><span class="line"><span class="cl">nippon = 日本
</span></span></code></pre></div><p>もう少し細かい処理が必要なら <a href="http://golang.org/pkg/unicode/utf8/"><code>unicode/utf8</code></a> パッケージを使う手もある<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup>。</p>
<h2>ブックマーク</h2>
<ul>
<li><a href="http://blog.golang.org/strings">Strings, bytes, runes and characters in Go - The Go Blog</a></li>
<li><a href="http://knightso.hateblo.jp/entry/2014/06/24/090719">Go言語のstring, runeの正体とは? - golang - The Round</a></li>
<li><a href="http://qiita.com/hokaccha/items/3d3f45b5927b4584dbac">Goでマルチバイトが混在した文字列をtruncateする - Qiita</a></li>
<li><a href="http://qiita.com/masakielastic/items/01a4fb691c572dd71a19">Go で UTF-8 の文字列を扱う - Qiita</a></li>
</ul>
<p><a href="https://text.baldanders.info/golang/bookmark/">Go 言語に関するブックマーク集はこちら</a>。</p>
<h2>参考図書</h2>
<div class="hreview">
<div class="photo"><a href="https://www.amazon.co.jp/dp/B099928SJD?tag=baldandersinf-22&linkCode=ogi&th=1&psc=1"><img src="https://m.media-amazon.com/images/I/416Stewy0NS._SL160_.jpg" width="123" alt="photo"></a></div>
<dl>
<dt class="item"><a class="fn url" href="https://www.amazon.co.jp/dp/B099928SJD?tag=baldandersinf-22&linkCode=ogi&th=1&psc=1">プログラミング言語Go</a></dt>
<dd>アラン・ドノバン (著), ブライアン・カーニハン (著), 柴田芳樹 (著)</dd>
<dd>丸善出版 2016-06-20 (Release 2021-07-13)</dd>
<dd>Kindle版</dd>
<dd>B099928SJD (ASIN)</dd>
<dd>評価<abbr class="rating fa-sm" title="5"> <i class="fas fa-star"></i> <i class="fas fa-star"></i> <i class="fas fa-star"></i> <i class="fas fa-star"></i> <i class="fas fa-star"></i></abbr></dd>
</dl>
<p class="description">Kindle 版出た! 一部内容が古びてしまったが,この本は Go 言語の教科書と言ってもいいだろう。感想は<a href="https://text.baldanders.info/remark/2016/07/go-programming-language/" >こちら</a>。</p>
<p class="powered-by">reviewed by <a href='#maker' class='reviewer'>Spiegel</a> on <abbr class="dtreviewed" title="2021-05-22">2021-05-22</abbr> (powered by <a href="https://affiliate.amazon.co.jp/assoc_credentials/home">PA-APIv5</a>)</p>
</div> <!-- プログラミング言語Go -->
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>ちなみに <a href="https://golang.org/" title="The Go Programming Language">Go 言語</a>で取り扱う文字列の文字エンコーディングは UTF-8 が既定である。ソースコードがそもそも UTF-8 を要求しているし(つまりリテラルの文字列はかならず UTF-8 になる), <a href="http://golang.org/ref/spec#String_types">string</a> と <a href="http://blog.golang.org/strings" title="Strings, bytes, runes and characters in Go - The Go Blog">rune</a> の相互変換も文字列が UTF-8 であることを前提にしている。しかし,実際には <a href="http://golang.org/ref/spec#String_types">string</a> の実体はただの byte 配列であり,中身が UTF-8 文字列であることを保証しているわけではない。通常は,未知の文字列についてはいったん byte 配列に格納しておいて,何らかの方法で UTF-8 に変換してから <a href="http://golang.org/ref/spec#String_types">string</a> にキャストするのが安全である(または各文字エンコーディング用の独自 type を作るか)。なお,文字エンコーディングの変換については<a href="https://text.baldanders.info/golang/transform-character-encoding/" title="文字エンコーディング変換">別の記事</a>で改めて紹介する。 <a href="#fnref:1" class="footnote-backref" role="doc-backlink">↩︎</a></p>
</li>
<li id="fn:2">
<p>厳密には文字単位ではなく Unicode の符号点単位と言ったほうがいいかもしれない。 Unicode では結合文字(濁点記号や異体字セレクタなど)にもコードポイントが割り当てられているので,特に日本語で文字を意識した文字列操作を行う場合は注意が必要である。ぶっちゃけ面倒くさい。 <a href="#fnref:2" class="footnote-backref" role="doc-backlink">↩︎</a></p>
</li>
<li id="fn:3">
<p>ちなみに <a href="http://golang.org/pkg/strings/"><code>strings</code></a> パッケージは内部で <a href="http://golang.org/pkg/unicode/utf8/"><code>unicode/utf8</code></a> パッケージを使っているようだ。 <a href="#fnref:3" class="footnote-backref" role="doc-backlink">↩︎</a></p>
</li>
</ol>
</div>