Athena Workshop > Athena基础介绍 > MSCK repair table

MSCK repair table

这个命令扫描S3(或hive)，更新partition。

它会对比Athena表元数据里的partition和S3里partition，如果S3里有新的partition，它会将这些partition添加到athena的metadata里。

要点：athena会在自己的元数据里存储S3 partition的信息(即s3目录里有哪些文件夹，例如2012-01, 2012-02)

示例

例如我们在athena里创建一个表：

CREATE EXTERNAL TABLE amazon_reviews_parquet(
marketplace string, customer_id string, review_id string, product_id string, product_parent string, product_title string, star_rating int, helpful_votes int, total_votes int, vine string, verified_purchase string, review_headline string, review_body string, review_date bigint, 
year int)
PARTITIONED BY (product_category string)
ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3://amazon-reviews-pds/parquet/';

S3目录按product_category做了分区：

但是此时athena metadata里并不知道这些分区信息，所以我们要将这些分区信息加载到athena metadata里，执行：

MSCK REPAIR TABLE amazon_reviews_parquet;

执行SHOW PARTITIONS amazon_reviews_parquet;，可以查看这些加载好的分区信息：

注意

MSCK REPAIR TABLE 只会往athena metadata里添加partition，并不会删除；如果s3上的对应partition被删除，想删除athena里的partition，需要手动执行ALTER TABLE orders DROP PARTITION (dt = '2014-05-14', country = 'IN');