软件发布| 专题库| 优优资讯| 苹果专区| 安卓专区| 软件下载| 首页
优优资讯 电脑教程 安卓教程 安卓攻略 苹果教程 苹果攻略 新闻资讯

Apache Pig教程之WordCount analysis概述

时间:2015-04-16 来源:本站整理 我要评论

  作为面向过程语言,Apache Pig可以借助Hadoop 和 MapReduce 平台快速查询大型半结构化数据集。今天小编就首先跟大家分享WordCount analysis的实现代码。

  grunt> cat /opt/dataset/input.txt

  keyword1 keyword2

  keyword2 keyword4

  keyword3 keyword1

  keyword4 keyword4

  A = LOAD '/opt/dataset/input.txt' using PigStorage('\n')  as (line:chararray);

  B = foreach A generate TOKENIZE((chararray)$0);

  C = foreach B generate flatten($0) as word;

  D = group C by word;

  E = foreach D generate COUNT(C), group;

  dump B;

  ({(keyword1),(keyword2)})

  ({(keyword2),(keyword4)})

  ({(keyword3),(keyword1)})

  ({(keyword4),(keyword4)})

  dump C;

  (keyword1)

  (keyword2)

  (keyword2)

  (keyword4)

  (keyword3)

  (keyword1)

  (keyword4)

  (keyword4)

  dump D;

  (keyword1,{(keyword1),(keyword1)})

  (keyword2,{(keyword2),(keyword2)})

  (keyword3,{(keyword3)})

  (keyword4,{(keyword4),(keyword4),(keyword4)})

  dump E;

  (2,keyword1)

  (2,keyword2)

  (1,keyword3)

  (3,keyword4)

  store E into './wordcount';

  <pre code_snippet_id="327646" snippet_file_name="blog_20140505_2_6349649" name="code" class="java">TOKENIZE

  Splits a string and outputs a bag of words.

  Syntax

  TOKENIZE(expression)

  Terms

  expression

  An expression with data type chararray.

  Usage

  Use the TOKENIZE function to split a string of words (all words in a single tuple) into a bag of words (each word in a single tuple). The following characters are considered to be word separators: space, double quote("), coma(,) parenthesis(()), star(*).

  Example

  In this example the strings in each row are split.

  A  = LOAD 'data' AS (f1:chararray);

  DUMP A;

  (Here is the first string.)

  (Here is the second string.)

  (Here is the third string.)

  X = FOREACH A GENERATE TOKENIZE(f1);

  DUMP X;

  ({(Here),(is),(the),(first),(string.)})

  ({(Here),(is),(the),(second),(string.)})

  ({(Here),(is),(the),(third),(string.)})</pre><br>

  <br>

  <pre></pre>

  <br>
 

用户评论

(已有0条评论)
表情
注:您的评论需要经过审核才能显示哦,请文明发言!
还没有评论,快来抢沙发吧!
快速检索
0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z