Article Outline
BQML ロジステック回帰
TOC
Collection Outline
[BIGQUERY] 分析入門
[BIGQUERY] SQL
■ 句(節)
■ 演算子
■ BigQuery特有
■ 関数
- approx_count_distinct() - 集計近似
- coalesce()
- countif()
- extract()
- greatest()
- ifnull()
- last_day()
- lead() / lag()
- left() / right()
- lower()
- nullif()
- parse_date()
- row_number() - 番号付
- safe_cast()
- split()
- st_distance()
- nth_value()
- first_value() / last_value()
- percentile_()
データ型
■ エラー
- error
- Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex
[BIGQUERY] ADVANCE
[BIGQUERY] ML
[BIGQUERY] DS
■ DS100ノック
■
[BIGQUERY] GA
|| ロジステック回帰
GoogleCLoud内部に、BigQueryMLのチュートリアルが用意されている。(なんて親切!)
クイックスタート>BigQuery ML で機械学習モデルを作成する
#standardSQL
-- ロジスティック回帰(分類モデル)
create model `bqml_tutorial.sample_model`
options(model_type='logistic_reg') as
select
if(totals.transactions is null, 0, 1) as LABEL
, ifnull(device.operatingSystem, "") as OS
, device.isMobile as IS_MOBILE
, ifnull(geoNetwork.country, "") as COUNTRY
, ifnull(totals.pageviews, 0) as PV
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
where
_TABLE_SUFFIX between '20160801' and '20170630'
#standardSQL
-- k-means
create or replace model
bqml_tutorial.london_station_clusters
options(model_type='kmeans', num_clusters=4) as
with
hs as (
select
h.start_station_name as station_name
, if(extract(DAYOFWEEK from h.start_date) = 1 or extract(DAYOFWEEK from h.start_date) = 7
, "weekend"
, "weekday"
) as is_weekday
, h.duration
, ST_DISTANCE(ST_GEOGPOINT(s.longitude, s.latitude), ST_GEOGPOINT(-0.1, 51.5))/1000 AS distance_from_city_center
from
`bigquery-public-data.london_bicycles.cycle_hire` h
join
`bigquery-public-data.london_bicycles.cycle_stations` s
on
h.start_station_id = s.id
where
h.start_date between cast('2015-01-01 00:00:00' as timestamp) and cast('2016-01-01 00:00:00' as timestamp)
)
, stationstats as (
select
station_name
, isweekday
, avg(duration) as duration
, count(duration) as num_trips
, max(distance_from_city_center) as distance_from_city_center
from
hs
group by
station_name, isweekday
)
select * except(station_name, isweekday) from stationstats;
cf.ロンドンのレンタル自転車のデータセットをクラスタ化するための K 平均法モデルの作成 - GoogleCloud
|| REFERENCE
- 分類モデルの作成 - GoogleCloud
- BigQuery ML でロジスティック回帰してみる - Qiita
- 「BigQuery ML」:SQLで機械学習ってどういうこと?試しにSQLでロジスティック回帰を書いてみた。 - Wantedly,inc.