2-3-22 鳩の巣原理で重複を調べる

SQLパズル

hatoT
ID  Seq  Val
--  ---  ---
 1    1    2
 1    2    2
 1    3    3
 1    4    4
 2    1    3
 2    2    4
 2    3    5
 2    4    6
 3    1    5
 3    2    6
 4    1    7

IDごとに、
Valが等しいレコードが2レコード以上ある
レコードを出力する。

IDとSeqがプライマリキーとし、Valは、not nullとする。

出力結果
ID  Seq  Val
--  ---  ---
 1    1    2
 1    2    2
 1    3    3
 1    4    4

データ作成スクリプト

create table hatoT(ID,Seq,Val) as
select 1,1,2 from dual union
select 1,2,2 from dual union
select 1,3,3 from dual union
select 1,4,4 from dual union
select 2,1,3 from dual union
select 2,2,4 from dual union
select 2,3,5 from dual union
select 2,4,6 from dual union
select 3,1,5 from dual union
select 3,2,6 from dual union
select 4,1,7 from dual;

SQL

--■■■分析関数を使わない方法■■■
select ID,Seq,Val
  from hatoT a
 where exists(select 1 from hatoT b
               where b.ID = a.ID
              having count(b.Val) > count(distinct b.Val));

--■■■分析関数を使う方法■■■
select ID,Seq,Val
  from (select ID,Seq,Val,
         count(Val)          over(partition by ID) as Valcount,
         count(distinct Val) over(partition by ID) as Valdistinctcount
          from hatoT)
 where Valcount > Valdistinctcount;

解説

count(Val)とcount(distinct Val)
を比較してます。

count(Val)を、鳩の数、
count(distinct Val)を、鳩の巣の数、
と考えて、数学の鳩の巣原理を使ってます。

遊園地の入園者の
延べ人数が、count(Val)
実人数が、count(distinct Val)
とすると、リピータ(重複)が存在する必要十分条件は、
count(Val) > count(distinct Val)
だと考えると分かりやすいと思います。

■■■■■■■■■■■■■■■■■■■■■■■■■■■
別の考え方としては、
count(distinct Val) + リピート数 = count(Val)
を同値変形(移項)して、
count(distinct Val) + リピート数 = count(Val) ⇔
リピート数 = count(Val) - count(distinct Val)

よって
リピート数 > 0 ⇔
count(Val) - count(distinct Val) > 0 ⇔
count(Val) > count(distinct Val)

鳩の巣原理