# 3１〜4０本ノック

Article Outline

## || データサイエンス100本ノック（構造化データ加工編） SQL編

### | S-031 ★

レシート明細テーブル(receipt)に対し、店舗コード(store_cd)ごとに売上金額 (amount)の標本標準偏差を計算し、降順でTOP5を表示せよ。

``````select
store_cd
, stddev(amount) as std
from
`100knocks.receipt`
group by
store_cd
order by
std desc
limit 5
;``````

### | S-032 ★

レシート明細テーブル(receipt)の売上金額(amount)について、25%刻みでパーセンタイル値を求めよ。

``````# BigQuery
select
store_cd
, max(min) as min
, max(q1) as q1
, max(q2) as q2
, max(q3) as q3
, max(max) as max
from (
select
store_cd
, percentile_cont(amount, 0.0)  over w as min
, percentile_cont(amount, 0.25) over w as q1
, percentile_cont(amount, 0.5)  over w as q2
, percentile_cont(amount, 0.75) over w as q3
, percentile_cont(amount, 1.0)  over w as max
from
`100knocks.receipt`
window
w as (partition by store_cd)
) as statistic
group by store_cd
;``````
``````# PostgreSQL
SELECT
PERCENTILE_CONT(0.25) WITHIN GROUP(ORDER BY amount) as amount_25per,
PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY amount) as amount_50per,
PERCENTILE_CONT(0.75) WITHIN GROUP(ORDER BY amount) as amount_75per,
PERCENTILE_CONT(1.0) WITHIN GROUP(ORDER BY amount) as amount_100per
FROM receipt``````

### | S-033 ★

レシート明細テーブル(receipt)に対し、店舗コード(store_cd)ごとに売上金額 (amount)の平均を計算し、330以上のものを抽出せよ。

``````select
store_cd
, avg(amount) as mean
from
`100knocks.receipt`
group by
store_cd
having
mean > 330
;``````

### | S-034 ★

レシート明細テーブル(receipt)に対し、顧客ID(customer_id)ごとに売上金額 (amount)を合計して全顧客の平均を求めよ。ただし、顧客IDが"Z"から始まるのものは 非会員を表すため、除外して計算すること。

``````-- select
--     customer_id
--     , sum(amount) as ttl_amount
-- from `100knocks.receipt`
-- where not regexp_contains(customer_id, r'^Z')
-- group by customer_id
-- ;

with
customer_tb as(
select
customer_id
, sum(amount) as ttl_amount
from
`100knocks.receipt`
where
not regexp_contains(customer_id, r'^Z')
group by
customer_id
)
select
avg(ttl_amount)
from
customer_tb
;``````
``````# PostgreSQL
WITH customer_amount AS (
SELECT customer_id, SUM(amount) AS sum_amount
FROM receipt
WHERE customer_id not like 'Z%'
GROUP BY customer_id
)
SELECT AVG(sum_amount) from customer_amount``````

### | S-035 ★★

レシート明細テーブル(receipt)に対し、顧客ID(customer_id)ごとに売上金額 (amount)を合計して全顧客の平均を求め、平均以上に買い物をしている顧客を抽出せよ。ただし、顧客IDが"Z"から始まるのものは非会員を表すため、除外して計算すること。なお、データは10件だけ表示させれば良い。

``````with
customer_tb as(
select
customer_id
, sum(amount) as ttl_amount
from
`100knocks.receipt`
where
not regexp_contains(customer_id, r'^Z')
group by
customer_id
)
select
customer_id
, ttl_amount
from
customer_tb
where
ttl_amount > (select avg(ttl_amount) from customer_tb)
limit 10
;``````

### | S-036 ★

レシート明細テーブル(receipt)と店舗テーブル(store)を内部結合し、レシート明細 テーブルの全項目と店舗テーブルの店舗名(store_name)を10件表示させよ。

``````select
receipt.*
, store.store_name
from
`100knocks.receipt` receipt
left join
(select distinct store_cd, store_name from `100knocks.store`) as store using(store_cd)
limit 10
;``````

### | S-037 ★

``````select
*
from
`100knocks.product` p
left join
(select category_major_cd, category_medium_cd, category_small_cd, category_small_name from `100knocks.category`) as c
using(category_major_cd, category_medium_cd, category_small_cd)

-- on
--         p.category_major_cd =c.category_major_cd
--     and p.category_medium_cd=c.category_medium_cd
--     and p.category_small_cd =c.category_small_cd

limit 10
;``````
``````select
p.*
, c.category_small_name
from
`100knoks.product` p
join
`100knoks.category` c using(category_small_cd)
limit 10
;``````

### | S-038 ★

``````-- with customer_amount as(
--     select
--         customer_id
--         , sum(amount) as ttl_amount
--     from `prj-test3.100knocks.receipt`
--     group by customer_id
-- )
-- select
--     customer_id
--     , coalesce(ttl_amount, 0) as ttl_amount
-- from `prj-test3.100knocks.customer`
-- join customer_amount using(customer_id)
-- -- where
-- order by ttl_amount
-- ;

select
customer_id
, customer_name
, sum(amount) as ttl_amount
from
`100knocks.customer` c
join
`100knocks.receipt` r using(customer_id)
where
customer_id not like'Z%'
and gender_cd = 1
group by
customer_id
, customer_name
limit 10
;``````

Cf. SQL関数coalesceの使い方と読み方 - Sppirits

### | S-039 ★

レシート明細テーブル(receipt)から売上日数の多い顧客の上位20件と、売上金額合計の多い顧客の上位20件を抽出し、完全外部結合せよ。ただし、非会員(顧客IDが"Z"から始まるもの)は除外すること。

``````select
customer_id
, count(distinct sales_ymd) as days
, sum(amount) as amount
from
`100knocks.receipt`
where
customer_id not like 'Z%'
group by
customer_id
order by
days desc
, amount desc
limit 20
;``````

### | S-040 ★

``````-- select
--     (select
--         count(distinct store_cd)
--     from `100knocks.store`)
--     union all
--     (select
--         count(distinct product_cd)
--     from `100knocks.product`) as ttl_record

select
count(*)
from
`100knocks.store`
cross join
`100knocks.product`
;``````

cross join を知ると join が書きやすくなるよ、という話 - Feedforce Developer Blog