• Byzer 内置常用 UDF
  • Byzer 支持使用其他语言动态扩展 UDF
    • 支持语言:Scala/Python/Java 的自定义 UDF,
    • 动态扩展:无需打包重启应用,只需要在上下文中使用 Byzer 语法注册 UDF,即可使用

当然,我们也支持在启动时注册自定义 UDF 到 Byzer 中。

内置常见 UDF

Byzer 内置了很多功能强大、开箱即用的 UDF,如 http 请求、数据类型转换 UDF 等。

http 请求

http 请求可以让 Byzer 脚本变得更加强大,因为这可以集合所有内部或者外部 API 来完成某项工作。

Byzer 提供了一系列功能较为全面的 http 请求函数。

crawler_request

crawler_request(url) - 通过 get 方法请求 url,返回请求到的 html 网页

例子

> SELECT crawler_request("https://www.csdn.com") AS h AS html;
<!doctype html> <html lang="zh" data-server-rendered="true"> <head> <title>CSDN - 专业开发者社区</title> ...

crawler_http

crawler_http(url, method, map("k1","v1","k2","v2")) - 请求 url,返回请求到的 html 网页

参数

  • method - 支持 POST/GET
  • map - key/value 格式的参数。方法为 POST 时,参数使用 URLEncode

例子

> SELECT crawler_request_image("https://www.csdn.com", "get", map()) AS h AS html;
<!doctype html> <html lang="zh" data-server-rendered="true"> <head> <title>CSDN - 专业开发者社区</title> ...

crawler_auto_extract_body

crawler_auto_extract_body(html) - 抽取 html 网页中的内容

例子

> SELECT crawler_request("https://www.csdn.com") AS h AS html;
> SELECT crawler_auto_extract_body(h) AS b FROM html AS body;
专家推荐疯狂试探mysql单表insert极限:已实现每秒插入8.5w条数据 一个demo让你将多线程运用到实际项目 ...

crawler_auto_extract_title

crawler_auto_extract_title(html) - 抽取 html 网页中的标题

例子

> SELECT crawler_request("https://www.csdn.com") AS h AS html;
> SELECT crawler_auto_extract_title(h) AS t FROM html AS title;
CSDN - 专业开发者社区

crawler_request_image

crawler_request_image(url) - 通过 get 方法请求 url,获取 base64 编码格式的图片

例子

> SELECT crawler_request_image("https://pic4.zhimg.com/v2-1d0e51461a3eb098ac84ab0f6d3ce99c_xl.jpg") AS html AS content
/9j/4AAQSkZJRgABAQAASABIAAD/7QA4UGhvdG9zaG9wIDMuMAA4QklNBAQAAAAAAAA4QklNBCUAAAAAABDUHYzZjwCyBOmACZjs+EJ+/+EATEV4aWYAAE1NACoAAAAIAAGHaQAEAAAAAQAAABoAAAAAAAOgAQADAAAAAQABAACgAgAEAAAAAQAAA3igAwAEAAAAAQAAA3gAAAAA/9sAQwACAgICAgECAgICAwICAwMGBAMDAwMHBQUEBggHCQgIBwgICQoNCwkKDAoICAsPCwwNDg4PDgkLEBEQDhENDg4O/9sAQwECAwMDAwMHBAQHDgkICQ4ODg4ODg4ODg4ODg4ODg4ODg4ODg4ODg4ODg4ODg4ODg4ODg4ODg4ODg4ODg4ODg4O/8AAEQgAyADIAwEiAAIRAQMRAf/EAB4AAAEEAwEBAQAAAAAAAAAAAAcEBQYIAAMJAgEK/8QARBAAAQMDAgQDBgMFBQYHAQAAAQIDBAAFEQYhBxIxQRMiUQgUMmFxgSORoRVCUrHRFjNigpIkQ3LB4fAYNERzg5Oy0v/EABoBAAIDAQEAAAAAAAAAAAAAAAQFAQIDBgD/xAAuEQACAgEEAAQFAwUBAAAAAAABAgADEQQSITETIkFRBRSRobFhcfAGFSNC0TL/2gAMAwEAAhEDEQA/AOmkDUFuloYaC/DWeh7GpjOnx7XpOZcpr6WokWOp551R2QhIySfsK5PcCvaXnaqt8Kyakhe7XZI/DktfA9j+Rop+1N7QStPeyC/pK3yfBvt/5ori8bpi8p8Uj65Sn6KNa/NVNX4gPE8dDcLApHf495zl9pvjlduLfGe4FLymNNwZLjVtjpXsrzeZ1WDgqVgb9gAKqq8vkRyk+b1pyfd94lrV8KR0O1M8lQU//EB0NcoxDEt7xrc+eB0OBExT5vXPzrFbpOex6Vt2KwSD+da3NySBgGqgiBYxNRWQkgHOawnI26d68dFV7A2/61GJUT4D6Hb69K8mvX3xX0pyPnXse896z4k4Gx3re0860+h1pxSHEHKVJO4NJ8Zr0CQodjU8ET3IM6C8FPajuV0t1q0PxAuBd8BIag3R1eCpI+Ftz59gr8/Wuhem7jbptgSzDeyVj4mznNfn3ZfLEtt1PVKu9foA9j+12K+ey1pW7vvftGY7DCnFqPRXMQQfoRj7URVaFPnPEMNn+HrJEl0OU3bJLqQ4W3SNiaYpaTdJxZeUVg9SR1otaytFpXfi0hpLBHp0qJQ4DKJASlHMAcAg0epJ5HUw3ArmN9ltNigrDLtoafz++WxTjfdJWS7wfd0WdpDbgwTyCn2SyiPBC0N5V9ae2ZDadNIdAT4gGT60xDuwxAtq7siQCBoxyw2FuHBQUR0p8qB2Fb3rfOatnOkEcvUUQorztwtilpOMJoYall3yG0+G8FofWsTUFEKSzc2DI+ma4zcXHXhjHTBp2has8FtYcVyJHcmgHedVXJt5TZAyTTHqe9T29CJcQ4W1uYBKaUu4zgRwKkxyZamBqlqeSliUF4Vg4VnFEu1yHm7N7wpfMhQ61Srgm67LYlRRzOO+PzZzknIFXzsemvH0cy2tRQvG4NTWGs4EQ3WrXcUE+Q2UybS8+lIyBnpWVJrfZjEjORubmQoYzWUX8vZKC9fecXeAel46OLNshPNBfIfKr0pJ7ebLFr4u6PtUY4LdpW64j/icwP8A8078KrwqDxKiTUueGpO+aEvteanXqfjxAdfdQ86zbEtgpG+CtRAP/fek9Lk6cp+s7bV1tu3g8YlPeVYjLWAME4PzppUrDhOc4NPbmEIUgDbBJpjKQVbnbtXjknE5hxNysqXlOyVDvWheeZKTnY07iG6m0FxaT4eOvpTVgqVjvUlTKMCO4mVsR9KzmwNhSx5scqTmkffHSq9yhG2fUgk5r2MpOx/WvKQRuDitnMDnIx61fuUnw/3eds+la8gn0NfSME4ryD5gaqRxJzzPXUCr8+xxxuvelJdy0Uu4csFX+0QG3DshRPnSD6HY49c1QTJ5jinWyXKbadSxp0F5TElpeUKHaqdHPtNq2AbB6M7zvaxvl4uDT0h1BStQBUD2oj28ve9xkJUCFJ3NUB4HcRpur9JMme6EPsL5F4V1I71dWy3Mlth8PjKB60zVy44M3dEQ4k3vCJ8QFa1fhfSltgdiyk+HMlcjfoTTZetRsSNKhlYBdx8VQZq48ueVdMaSQOYMVDrxLEMogswFJguhQx1BqK3pMN6zuh9Y2671HdN3BxcJxIUT96iOsL0YcJ5IWSs570S5wkGWs78QYax/s5H5w2Ul3Pb1oaarIVoFkpPl2xSG+TFvXLzK3UrfNfNVSA1oJhsqHQHpXPWHJnQKuwAZzCT7M8lj+3Ext/ATzg+b6V0fiTYcexOSFuobjtjKl52ArlHwebmOS5cuDILCgvlyO9HvjfxDuWj/AGJbu7CkqXcfAxz9wTtn9aP0lqopz6TmtTSW1PHrLOSON3DiDc1RJOqIKH+fl5TISDnOPWsr8xlyvc+VcXH35Ljjil85VzHOc9c1lZ/3Rj/p95s2jqBxuP0nSjhBpNjUUVM6LJBcKdwD8NCL2m+GkiySGtTIe8ZPMlqQkDdIPQ5/760m4J6xvOkp06FMbchPKVkhxJA+1EviHd1610u/AfkpdS6ghQB3ryCr5fAGDOvd/EOPec6pKkhk/wAWCBvSS3RVzb5HitgqK3AnA+tO2oLc/ar27DkoKFIJCcj4t9jUi4a2lyfrlqQW+ZtglatienT+dKA3rEoqZtQKzCdD0Gidpl5pJ86UFCdh8eNv+VAe5WuRa9QuwpjamnEK5VcwxjFdENHWPlt8SQ+1zKUkOYHfm3oXcR+HsWdxS8RDSXGZYIxnHmAGckdNu/yqhtJM6G7Ri1MjgiU2fZUho/vAjIPrTYU9+v0qzEng5IQHvBJejhXM2RnKcjtQw1BoW7WNtbjkNx1CP3kIO4PfbpUhhnmJLdHaBmDMg5I9K+lOwJ2p5chyFWwyFRlob7K5Tg/LP2psWw+YnjBBLXrWmQYrKEGJyP8AvNfMbivuQB8688221WxM56wRv8+lTvQWiL1r3X0Ow2FCTKkLCS65shod1KP0qC+ZR9e1X99lbRd0tVjVq1+OEtTD/s2Unnwkkc30NZ2K7DFfJhemVWs8/UtVw69nSJwy4Zx25MgTZbg8SRII5edR9B6VOZjIt9mIjPABI7Kpp1DqzVEqyCIH0sNoGylDJxUIM65L0+45JmeI5g9E4FToxeleLu43c1NwOoS7dcn5lpUkhSsbZpKqYthwhzKfrUf0hqeLCtxE0DAzua837UsOdOIigBHr2p2rgDuABD4hGOIY9I3dAhuhSxncUlv8qBJLgeUkk/OhrZZKPcPEEjkKuu9aLi7HcfIMlSzjsa2aw7OZddNl8gwW6ydixdTtpYWCCvJAPzpu1ozcHdCsPx4by2kpypSWyRinK/2+2v3OPJD+XEOAqHN2z3qwL2qtJ23hClq4ttjLOMn6UgvuStgD6xkNLc65QZx3K98AZNwm3q4Ro4y3zbg9jVyZehIWteG10sF3bQ74jJSUK3yMVUPg5JRE4l6guduwi3vPFTSfT6VZC161kWvUEm4TpAS1uEpJ7URpNRSrlXH/ACKNXpbT5lM5LcTuCFz0HxMucNURxy2tuksOkZBRn/lWV0N4k3jT2r5ikSVNAPq9RWUDbpw1hNZ4jWlqxWPEXma7no3TVxt6yq3oYeUnZaEgb1SnX9tvfD7iQzKbfW9anHQQlRyAO4q9r85HOW0HpQP4x2L9vcNpKkt8z7SSpJx0p5bWNmItotYMDKicVrOqf+w7zAilcKTGV5wP3icgH86sbwq4SWxfDiz3aG4Mz7clx10b+ZQG32IqCaXsj964AwPfFDwo91aYdSeqUqWE5/WjdwuFy0RotEKSVS7IJT7SCBvGWh5SSk/LbIpEUIzOtWutrPFHZhQasKIdvEdtICUp5U4+QqKTdONy3h7w2pS8nK0HzA9iD61tvuuksu80VIW1tgYOfvjpSa08QLPLKETHfdHs7pd2BPyNYGpiciE59DFlrsLrKXmZbCHY6hlLqBgq+qf3T9Nvp0pY5oyzzAUvx/ESchSSPiHoaljdyhPQitl5C87jlUN63sTI6UIStacn9aMRSsHbzQTXbhHp2TD8KLESwlX94jkBQoemKHVx4A6YNqeSIqEuqQoAhOMfl9qtJJuVvbinnkIRgdSqhjftY26OHG2XkPqHxFJ2HrVW3Ewbap4IlKLz7OsqMhKoN1BVzkKS43nA/Og9q/RLumJcOOHi+68rkwcDKvlV+F36LcI0ojycoOyjjNVe4pIduurbGhtAaYbWolYGeqkpz+ZqgYgcxfdo6tvlHMH/AAn0DL11xvten0tLW0XOeVydAhJ82TXYqzaft9g07GtkNhEeLHaDbTaE4CQBtQF4BcKHdGaIGp2UIEuekgOujCy1nPTtmrBOzFOP8pO+N8V0Glq2V7mHJiS0CttinruMN4ShyOrmTlIobov1vF1cgKbKEpOCop2opymvFYXkZzQi1HaWFuOICfDWrOVDaq3Lg5lqn4xGnUlyiR2wITyTnqEmvemyzcGFvyHgAnoM1FW9MsNLJLinMn945pwRFTHZLbaigHrihgR6w04AxCrp5+JJjyENOgtoUQDmmXVl+tmmrO/cXzzJQCTg5NQWK+/b0qTGdUgHrtTTeYyrxBWxMWXWlfED3rxMruIPcj2j9b27W2tLgYQJbbIJz86LL9ge1LbjCdkrbYQcYyKF2ndOQtOyXHLaylgr+PlGM1P4l4lREKCM5V1NDsgbG4ZM2Fzr/wCTJlp2z27RGnnR4vM2ndSid6dZlwj3vTwejr5mz0PrQylXSROYWw6o8iuorZDlPQ7aIzIV4Y6VXZ5pQ2ZEr7xhv86z6ngsQVuJVz/u56VlGK46ciXmb48uKHXOxKc1lUNbZ4MrvhO06Zt44li0tJLind0ij+3wytky3KjXZ4ltxOHA2np8s1ErdAt+ib47NYcTOnAfh+TzoTntRCtl8nX+A+rw1QXFeYIcHxD1+VdMqBRzyZudMSu/oTZZOCXDxnTUm3WOEt6Il/xHmZLoWonOc5wM7jPSne26K0gzCmWxmI0WHpC3lJV/Go5V1+dA+Txit+k0X4zJnguMulruMqx0FN3C/i1C4h3WfZlByI82Apb6VnBSTvv86WnUacWbAOTLWU6lULbjtEsqq1aSgtNx1WODJwPCKVtpVtUa177Pmkdc8PIE3ScSBp9+MHDJeaZCUufLA60P+OuqFaatdrd0pFzLUgR0hKiedRHX6jFOWiOJLsPhsFTFmQW4pQ9HDuB4hG/1rxtU3GlxxBfDv8Jbq25zKKwNYxrfxFlaaEsrkRZLkcJI5SsoURlIPUbHpUsvurRHs7a0uqSogkqJ5QnHcntTNddAxb7xTizJLEhaWbkZEfwHOVZVkrUM9QkHlyR6jfembirpL3zT9tgyZVw93S6kK8dxBSpOSSFcoGeuxO4x8qSMpDEZ/adMpcJkjMiauImnZWoEx5moHpSyrHhRUOPZ/wBIIpPN1do6OVD3i6MNJUQovwFp3HqMZx9qllm4fsotcd6xO/st7kHOmKkJUCP50QLboG4vJD93ur8llI+FRCSr7AVuEZYLm3HJEE9gvemr7IS1Cv0coz/dLdDa1f5VYP6UyKs72teMMyJZGPfoUaUxF5m/M2EoHiOKKht1WkZ+VS3iVwxslyZhSLFaIbl6akJDyHCptDjRSeZSygpJ5Tynrn61cL2fuHGndE+x3dJ8pTZuE59RZcUjlJAUeg+Zz3JwBUrXvbaTAbr7Kl3EZ5ju1Odh6SgW5OyI7CGwE9NgBTKJ3hPeK4ds5NL3VJcSeXemOawhLKlOuBCe+TT4PhYlCFm67jXdOI9sgOraSyt9xPUITn9asnoDhXZ+IHD6HfnFjEhsLCT1GaoTqV2Oi6OhkhSM1cL2feJl7g8KG4ES0PzG2FFCVpxhW/agKNUGvK29TbW6Y16cNVwfWGz/AMNunT8RSftWxPs36ZHUJJ/4c0qe4p6tSjKNLylD5cv9aQji1rLxAP7JywCfVP8AWm5v0Y9vpOexqiOz9YpHs46W7oR/oFbE+znpMKz4SD/kFGSyXmTM0szJnN+7SnEZ8MnoaFl71fryNqCQxbrP7xGB8i/EAyPyqfH0oHP4mI8djjJ+sSo9njSKejLZ/wDjH9KVp4BaRHVhs/VsUxr1hxPX0sQH1e/6Um/tRxVWvAtLafq8f6Vl81pf4Jr4ep9T95LEcBtHJOfdGyf/AGhSxPA/RqcZho/+sVEE3jiy4jPujCPq4r/+a1t3Tiy68UeGwn/Or+lSNXpv4JXw7/f7yeo4M6PbORDR/oFZQ/clcWkujmXHSD/iV/SsqRq9P/BI8K/3+8rVYHL5MivOXiKlhaXssuEfGj1omQHY0Z1Tqn0htCdznrVT71xMuOmuJirDdGHk26SnMCQs4Cu3LRPgX2VcLQHHkFtoNbkHbHzrBblV9k7+9WPJOBHbjNb+Hcngdc3brGRHW+PEjeEQla3s7GqfcMNTuaIs89VoYRdruteXEtozyIHZRrOLN3m3bWSTKuvj2GGkFpCdkg5/L71E9Casstu43w3Lelp+zuqSi4PFZLbfOQkBPqeYpJ7YBpDrLQ925RjHGZpTX5PDJzmWKt41NqiN+276+5Ekh8ORGCMpSnH8NWDs0SwyvZvvky6xWLZe46VFE7xAcYGRhNQmU0Et86QOUJ8uOlReY4V259l1Z8JQPMnm2VWCW+G2W5hj6RbAApxyIs0VHMi0SrxKKnX3o4WHHTuEY2SPQb5wNt6hesktvnkdbDrSxgpO9EJi9MK00qO3HSw6llKPJjcDFDXUF0he8pWokpzgDGSo1keTmNiqCQ2zIdh8yWlqWAcJ5+o9M1LzLnCCW1PEJI33AP54zSi3RIr0RLyWwlSvNvsfpTjJYYU0Oyh+tFhzjEXPWhaRuQ20LbypH4hAKlZz9qd5HEVbWk4FhbUURIiQnynYkVGL44+3ZX0xWyqUshtlKf4lHA/nTtpvhY6bQZV/eUh7l5igk1gQzniBWmqvBaNcvi5FhEtl9CVY7mh/feLqJDa/9tyn0BoO8QWojnEq4xYmQywvw856kVAF24dCSR9aXPqmDFc9Rkml3IHUdwsI14zctURIzrxDTryULVnoCcZrsdwXtNo0twvgNRghxC2gQTg5zvmuDrERtp8LSMKByDXUr2cuJT194SRoEx4qmQcMLJVuoD4T+Vb6S1Xcg9xN8V0lvy4cenc6DOXWGYw/CQf8oppdu8VKiPCb6/wihYm+qKQAvbHrSKTeSCfP+tN8TisGFe4X9LcmOlK+VPyrYL+kODcZx6UFbreCl+Fle5PrX1288rycL7etWYSqqcw3/wBoABsU/lSdWogD8QoOi8nkBDlJnLyef4z+dZgTQgw2HVA8LHOOlNzOpSLl/ebUHxesnrTeL2f2qRz+X61YgCeCkw5zdSczgwusoIyr0Q6nzYyayq5EtsJlFePFulSNWaCYaKgw5l1Uhw55QN8UO9S8ThYtAvafN1dM2Q6GkpZVvyk4oI8R+JeoJ2vkTV3db4ipLcJgHyIB6nHfNDmw2HW+utaoes1kueo5qnMqMSIt0J+pAwkfWrWWb3JT1j579pCHuOWtdWTbg21AjTFiAhRwhKzuR619034idGx3I8gsPl9anFFW22An/nVnNFexPrrVM2NL1jcY+i4Tg5iyAJMo+nkSeUZ+avtV1+Gfsw8PeF9yjS2VSNUXpkgNyrmhBQwevM02BhKv8RJI7YrH5LUXL1jPv/zuep1S06o2Oc4HAEFHB++ah1BwVQjUVqnxJcHDLMyZEW0mc1jyrQVAcxGMEj5E9a831Uxy4eGwnmAV0KuUH71ZvVTpe4izoyv7pqEyGk9gDzn+dBm/WzwpzjiPM2d8ehoR0Wttmc4nT0Wm1Q5HcD141TEiW9yIG3oUnGAXscpPQ4NQkajtUdpKblMC1pOcNp239e9THVkWHc4vLJbKHU/vAYCvrUCj6fthkpWW0AjYhLe6vvVl56jFxWawTJ3ZtQpn8i4hdEYjZa2+UK+malDkhXJzlf136VEI60sx0toQENpGwG2KR3C8nwyw0vOdiR0qSQoyYpOSeJvuU5Uy5hLSj4TJBBHXm9dqJFk1ldRcoUW5rTNiLWG3VOfHg7Zz3ob2yNywg6sZWveplpa0u3viBbILQJT4yXHsD4UJIKifTpj716osbAF7MFvCeGS3pI7xG9my7ybvc9RaVuLU5by1Om2uo5Fn5IVnBPyOKqbcrTdrTNci3O2ybfJbOFtyGVIUn7EV2Q8PlQPl0psuFptd4t7kS7wI85lQxiQwlY/UU81PwSi47qztP2iPRf1DdSuy0bh95xoyc1ZX2cr2uFxElQQ7yoebCinPUirNX72duGF4W443bHrS6f37a+UD/SrmT+QqHWD2cjo/ilDvth1CZltSlQcjzUcrwz0wpOyvyFKU+D6qiwMMEfpHtnxnQ6rTNXkgkeolj27koIB5smkb9z5nh+JuPnTQ/EuMZo5YWQB1T5v5UxRveJ90LbRytChlOdxRTI6nkTkgFI4Mml3uH4sEqVncAflTZcr7HiyEeM8loEdzTlO07dJSoCm2jhCwVZHapjcuFcO86JbkyYyluY39aKqpa4kQQuiYzBgNYW1sJSqagE7DenA3ZDjfOl0EHoaR33SVgg2NEcxkhxs7EjoaH7+hOI0tJk2mU01A/wB2laCdqyapkbbCFKMu6EcXTr5801i6EXYZVsfnTPbtIa2ZjtCX+O70UUoOKJ9r0WoQEu3CGsu43PIajwrD2J7fWsiMy6gOsp5tyoVlTh/T8NCgow3CU/4OlZVvl2keMsa0cAOFUeVYW3tE2ubcmG/CZkSYaXDgd8KGCfmRRCGl4+mmWWm4zMeCMltlhsNoT9hgUG9J8X0aobYN0nsQLrkLYjtuZXy52OKs09PgXvh54MNHvzqv7x5R8ycdcUdpbUtHlhd9VqcvBi1cXHdTuAHkZGSnbGfQVKY4KU86zlQ86vuT/WotNhusz4xZbVy53OKlhUpuC66pHMPKk7ddqPVW5gLEcQY8RXF2jUcC9+GTBktiNIcHRtQJKCfQHJFD+e+iSwVoIUSNiDnNHydFjXKyPW+c0mTEeSUlLgyFD0+tB24cM7jBQr+zs9LsfOUxphPl+QWP+YrntZorS++sZzOl0OurVBXYcYgH1BHjuyiHUDbuDjNQV1DEV3LLZUc9VHIot37Q+v1P/h2JLueq25KCD+tRJHC3ilcnj4GnkoQOqlyG0jH3VSxaNT1tP0jttRp9vLj6iDqRIfcScrwk/up2FImYbipKDgqKiAk0crX7P2uZctKrxNh21jqpKHOZR+WwIowW7gjaY6Wk3C4OPtI/3TCQgH6k5P8AKjk+H6u3tcfvFN3xHR1f75/bmV8s1nud1ls260w1S5Ktjy/Cj5qPYVaXRWh4mkrIVKIkXaQB71IHQ/4U+gH61NbXY7XZLcItrhNQmepCBuo+pPUn60qWe3U9q6fSaBNN5m5b8Tjdb8RfU+ReF/MbHAObGfrSdSOu21OKkDmNefDyvAFNSImHEQJZ/AUSMZ6V9SyktIBHQU4LbwwRXlCOVsetRtlg2IlLQHQYqGX7Sjky8M3mzzFWi9tbeMlPM0+n+B1H7w+eyh69qIRbx+7Xzk74xtWb1I64YZmiWshyJHNGcUrRP1A/pO/lFp1NEwHYzpADg7LbJ+JJ7EffBo5Sb7Nj6KfRFLTjIT5TkVR/j9wxmav4etX7TalQ9YWgqehPsnlcdR1LXMPnuPn9apja+MvtCXGwfs+33Qe7tAtqcea85I23pJkaO7DAkHqMvDGorDIcH1EudxAvlyk3hSQ+loeJuAR61Z7h5d7eNLW8XGU2GvDGSVDfauOcyLxtuzqnZt6BKzk4HSmi8al436ct7cd3VElLCBhKW1Yx+lLbLc2bwDDjVmsJO+0vU9ihNpXEhx5Tf8WxqPTuKtqhtkLtjPTptXB1zWnGpjQi7sNbz0w07qQl7pWjTl04v67iOyWdaXJTbZxlUkiizqwQFCmAjS+bkzsxffaG09byoPWlkDp8KTWVxPvtm4gs6uhWe46ilyX5J8pVIVj+dZQxvtzwIWNPVjkGWB4J6nVatdQrhcbC/PjuZS8+4T+GexGa6taA1DZl2pyWmG426pP4aAfIoetVXtPAG8WibJgMXlufporUqI07HAeYwdklefMPrvR405YpdohttPvBRRt5e4onQ6KyhsN1Gur1dN1GQeYSz4ctx5YRjKyoD0FLH+RNsSnkHmPTFNsLIbAz1pdKPNJaaG4SK6PHM5TdNao6HI4SltI2pGLYtZ3PIPmc09BHI2K9AZFQQJcOQIwCzsJUC+54gB+EDFbnXGG0pQgBKUjGAKdltFSDgE49KjM0qQ4oKSQe+RV02iYWMW7nlx5HYZ9TWjxc7JFJhzLPqKUtt7HbYd62zBJ4IyMn60nIyqt61ZVygHHevSUcrKlnoO1VzPREUZUR6V9DYB+dbQkqrcE+bapnpoLfMdx968ONhKmx6mnINhKMkb0mlD8RrbeqiTxE5GwHWvC08wI6dqUpTlYz0rwAPDKj9atIjfJZ54q09RiqzcQeHsW1XaVfbTFDMeS7zy22xsHFdVAehPX5n51aUoxHUpY2pnmwWZdqeYkNhxDySFIUOooS+sWptMLpsapsiUwRHQmGVq7Cq46p1XbHuJ71omJ8RtKevUCrX8SLcjR0GWpauWKpClsLV3Hp9RVCtLNI1Br/AFDepIDiEqUEE1yNqurbTOlqccNJJqK1IkaJkxbRIw24CS2DkVPeDdtNq0GWpGEOqWSoVXGVfpsPUT0dh9SUFwhKc5HWixaImtG7KzKhygptxOQgpoZGyc4hTBGbjsR/1Z4jvtE2Hw0FTac5UBsKyoz+350bULZuyQ1MQfiNZVhYgJyZr8u7AETupBsch7TnvCSzlxKlNoUvzEeoFMEiIpp9KOUlfL0xQQ4p8RL/AKM0ZZ5VnbbLrT3Llat1pGSR9MU6WPjtC1DokXt9IRc2E4Uw23vkdvnTSv4kEsZXnPnRWsgdYZIiSl0BYKSPXrT5bogl3oeI4lpBPxrOwr7p58ax0bCu7HgtrdRnCTv9DSW6xSYL0PmU2TtzJOCDTWrVLqKyU7i8rsbBk3l6SkFhp2DIbmJUQPLtj50ySrNMg3RmG+lPiOEcqgfKc1ANO68maFuTVruriXbe8sJaddWSfkBR1lqb1JppEqOw6iQBlhaTsD6n5UrT4gwc12cETWykqAR0Zlut7FvbVHjuMS7gCFOhSclKflQt1w4kan8FI6jnXkY3NIn279YNTSFNyw5OcT5nW3ObCf4aYn35c65uPzHFOyFHzKVuTROmpsOoNzNke0xfGNonlpvIGK9vOBtHInrSkhLMfJ64prJLjhV0p+DxBivE2tI5lfWt0kBtlKfXc1vjNZUCewpPKVzSiB0GwqJTHE1IHlre0jmXketa0jOAKcmkcjee1TmeGZokKCQlA2ON6bpCh47WK3vr5nyrtmkDyz7y1vXgZB7npxwIaVv22r60QsDfYU3THMDA77Zrfb15bSSc71BMsojhISAylHr1pKtvKDt+YpetAXIST2GTXp5k8gChhOdk5qksRK68eNFXHWfBmZBt6GstJU6VkkOIwM+X1B3BH0rlVYL+1pNi5WGXFUJodKSoiu6DjKShYKcpx371zt9qTgV4d0PEjTcVKWcBF4jtJxynoHgPnsD9j60n11TFPEQcj8Rro7PMEMoHdJCU6jEnlyAsKx675oz2zjTFg2NiKi2KUtCAM7UIU2WbetVIt8NPM8v16UQ43A7VDjaXFrQgYz8JrmEZgPLGx3iw4ka1RqR+8TRc+QMlfROe1ZTjcNFzGr3FsjrgS8pWCsVlYldzZPc3fxQRtM7S23SmmOIfF24NXZ4JbtLSgUlYKTzfKhyOHNv0x7R77FhdTJsk5jnQknISQcKx+lPuh9BWe0akfaXqmZcH8BcpnxeULPcnG5zRLmQbdCkx5cU+XxVJQ2eqQe1Vpr8Y4JnixRsKeMR/0st3TE9iFbv/ACjzpOM7J9aI18YjLbZdhuKelPbqaAyRVWtXXnVVp13pi62aK5JsjMgImNNp5ieYgb4+5q18K72VT9tucJJS+hIDzas4BO1H0MNPqCoi7VU+UN3mRqNpG2600e5EnNe6ympYSFPM4Vt6Z7fSnq73x/Tdqfs8J4JMVrk5wfLjHY+tS2/XGBCVEKyh/wAQFbAQd0gDJJx2qq/Em6Try0u3trXBClhTTrecPI5hzChGD32E+pmVXJAbqTJV7hSYCI0YuLnhRW+94mRv2rYysNsKfdOwGSTUfstvaixUJQkJC8FXzOK9yb9bJOpH7NEktrMcD3gqVsMjpXY6dDpqQjnmBPiywleovcn+9cymkFLBPkKupra0PJv3piavFukp5I0qOtKE5IadCtvXrS+Dc7fNlBiLMYfd68jbyVH8gaYLnGYKwy0kzXkhLcxvjrTOCVukkdTTrMWGrelodTTYyCSNq1HUzaLGWwpYPatspwNx+UdTW9tPhMFSu1MEx4rkKwfyr0qZhWVH1pI6siUivYVgUkfWPGT6/WozKfrG65P8gSonqrGM1vtDwcYSoEAVHr9J8IsA4yVGtthf5UrB2HNkCoMIVDszCVGAKy8vfbyilCklauYjfGwpNb+Z7HL91dhUiRHQhJOd+6jUdShHpGIxTyZUOv7vema6W2LOs8mHLaQ7FdbKHW1pBSpJGCDmmzUXEOz22c7bLUP23d07KaZV+Eyf8a+g+gyaEF/l3nUEZwXe4LEdX/pIpLbQ+Rwcq+5+1LNRraaeOzG2m0F9pB6Eo5ceF0fSftgXS3Wm6x5dt2diKS4FFtKyfIrHdOCKsLG0deJEDLUpGOXaq3cStC3fR+s52prNIdVaX18yghR545+fqn0Paozp/izqa0uBDs915jsrm3Fcz/jY5HA9o5sSytsRbq60XGLx/t9vWsGSt3CVdqyopfNXPXDiNb78t1SnG3QrmPWsqoRMnmbuHwP2l8FX7Umn9QWtbKS7Pny8OjbPIk4x/wBaKettcwtPTtMNXJChMnyQUcp2SkDcn86EN51BHt12t+pbywp5tJ5WU9MjsBVf+Jmpta6r4iW+/wB4YVYLJF/DtrPUqCsebbrQdlTI+RNVy5GfSXhuXHO0WazvwDKixkNp5h7wnBePon1qa8OtZomafD7rqHVyCVg52SDuAK5G6ouM6Xqk+/TnJpjgJaJ6AddquJwiZvMy3WubFnPG15CVNuK2OKIqO/UdZImlmnpFeepfW3XWPJ1BKtw2cmN8rj7jmQ233CfShjxbiybhb/2DbZSo0FCmmW50RfK42suJUoj7A17/ABZtqvJtsxMWc214bbnXCiP60sui4mlLPZv2uVyFv+RaghTqkKDSlFRCQT+7j705r0+bCDxOadgrgr9IGdXJ1iiZZLhMlLZkt2+QlEaC8oILQLKHXMj/AHy0LXyfw7Y3JNIpU2DdLFHVpqx+5XGGpTq32IamQ2ylB5kLUUjm5hsE7nO/bNSKPq1OrrhHYiqzbWXOVwqiOkKxvjmwEpOMdTSy2ayjzZ0uNHSwUtyWkR/Dc5udpbgbKj6EEK29MetOkRbDu9JVmasY28yCybbMiaatkJWmIenSxaHXET4ZClvKTHUCyeVCcFWcnJOeXbeipwmjW4vJcadsct1iEgBy12RcZaegPM6pRCvptS6VOurOp4UeMYZiutrcV4qFFaQgAq6HHcVL7Fe3rjbihcB2AfdGXCl5BSorWFFQAJzgYHUCiwADAySU6j7KeL8w46A4FLIrPQkbfOtMWMebmUKWOOpZYJzUwYzRcJKW45QkjNRtS+pzt618lyi6+d/LTY5ICTjP2qJmQYuVIAJwM0necHiAnatbJ598bH1pJNc5XE77CrZkBeZB9WT+Sa2c/hNYCj9ajtn1U07qBMNt3lQgZUrPU+gpFrt2QdKzXmUeI57wOp7dM/bNV4RepNquhkNL/FQrfNLbrhW8f0Vg1eadAjrKx6d0ebrd5zcSGB5cnKnD/CkdSfpQaufEPUXEOYuNblu2DS+cFDasPyB/iUOn0H61Um8aulXvV9tRd5rjzYyG2uYhtvcbJHYmrHaVnMJtLTbQSEhIxiker1rudq8CO9HoKh5zyfxJvBt8aBBSzHaS2gdgNz8z6mtcs4SRSgPhSRg5GKRPqCgf1rnnbMfrXg8yD3yExOt70d9sLbWkpUlSQQRjvVE9a6Olac1rMEdjMNSyttIG3KfSr/3BA8NR6/WhJrKwtXm1KQQEym8lpwj9D8jUV27TzJsoVxKI3lQbt6VhJQQdxWU561iKY94acbLS0qKVJO2DWUUWDczntSGSzAl4uLb0paLdaHW2kzICgstH4HN/L9qENzVqLVOs4I1DdEeE2nDLMdvlbZAHYVlZTG7lyDDdKBgcQzaG4N2K4sft/UTnv1raUSVJHKpZ6YFHHh3pKzQP2o1Zi+zFYcLqWn3SvI9TWVlH01Vq64EEuyy2EnriS2yTY7+pjCbQgJa/GkOo6KOdhTtqYpvcpiMCRjOFg7gEYUfoRtWVlNE5Wc5aArZEZRpiwWyGp5mN7qyhOV8ri0pOBjJSDg/lTA9cbDJloiyrSfBb2C/KC2kEHPKDzAApB9RisrKPHAwJguXBJPMKVntVubjx1JYSptDSkNkkqwleCoZPrgVJVKaS4ChCU7AHCd8DpWVlTMBzFDchRBCE0xXGfypKeb7VlZXpBGTIo/Oyo8pOaTtKcffG+RWVlR+kuQAI+o/DZpknu+ZR7YzWVlQZ6sDdiDG8AydOyW8ZKkqO/r1oByNOO3B5x1CfIo5zWVlItRywjwcJArxAtj9kYVIKihbXmQrpg1OuEXEI3uxJQ4oofYIQ76E9iKyspHeoDYHtGehscWBfQyzMG7+KwjKskinf3gLA3x86yspMZ1MQy2itvb71CZ6MPqSobdtqysofJlx3Ku8bNJuqsTmobe1z+Hj3xCRuB05/t3rKyso9TxEGsrU2z//Z

crawler_extract_xpath

crawler_md5(html, xpath) - 返回 xpath 路径表达式下的 html 的节点信息

例子

> SELECT crawler_request("https://www.csdn.com") AS h AS html; 
> SELECT crawler_extract_xpath(h, "/html/head/title") AS t FROM html as x_title;
<title>CSDN - 专业开发者社区</title>

crawler_md5

crawler_md5(str) - 返回 str 的 md5 信息摘要

例子

> SELECT crawler_request("https://www.csdn.com") AS h AS html; 
> SELECT crawler_md5(h) AS m FROM HTML AS md5;
6dd43840dc3389a9639e7e6449a80f4f

 

常用函数

array_concat

array_concat(array(a1, a2, ..., an)) - 多个字符串数组拼接成一个数组,并且展开

例子

> SELECT array_number_concat(array(array("a","b"), array("c","d")) AS arr;
[ "a", "b", "c", "d" ]

array_intersect

array_intersect(a1, a2) - 返回数组的交集

例子

> SELECT array_intersect(array("a","b","c"),array("a","d","e")) AS ai;

array_index

array_index(array, element) - 返回数组中元素的下标

例子

> SELECT array_index(array("a","b","c","d","e"),"b") AS index;
1

array_number_concat

array_number_concat(array(a1, a2, ..., an)) - 多个数字数组拼接成一个数组,并且展开

例子

> SELECT array_number_concat(array(array(1,2), array(3,4)) AS arr;
[ 1, 2, 3, 4 ]

array_number_to_string

array_number_to_string(array) - 将数组内的元素类型转换为 string

例子

> SELECT array_number_to_string(array(1,2,3,4)) AS arr;
[ "1", "2", "3", "4" ]

array_onehot

array_onehot(array, colNums) - 返回 matrix 结构的 one hot 编码,编码是按列存储的

参数

  • array:希望进行 one hot 编码的分类值,必须是 int 类型。另外,分类值的数量即为矩阵的行数
  • colNums:分类的最大数量,即是矩阵列的数量

例子

> SELECT array_onehot(array(1,2,3),4) AS ma1;
{ "type": 1, "numRows": 3, "numCols": 4, "values": [ 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1 ], "isTransposed": false }

> SELECT array_onehot(array(1,4),12) AS ma2;
 "type": 1, "numRows": 2, "numCols": 12, "values": [ 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], "isTransposed": false }

array_slice

array_slice(array, from, to) - 返回数组中,从下标 from 到下标 to 的子数组。当 to = -1 时,取到数组结尾

例子

> SELECT array_slice(array("a","b","c","d","e"),3,-1) AS sub;
[ "d", "e" ]

array_string_to_double

array_number_to_string(array) - 将数组内的元素类型转换为 double

例子

> SELECT array_string_to_double(array("1.1","2.2","3.3","4.4")) AS arr;
[ 1.1, 2.2, 3.3, 4.4 ]

array_string_to_float

array_string_to_float(array) - 将数组内的元素类型转换为 float

例子

> SELECT array_string_to_float(array("1.1","2.2","3.3","4.4")) AS arr;
[ 1.1, 2.2, 3.3, 4.4 ]

array_string_to_int

array_string_to_int(array) - 将数组内的元素类型转换为 int

例子

> SELECT array_string_to_int(array("1","2","3","4")) AS arr;
[ 1, 2, 3, 4 ]

matrix_array

matrix_array(matrix) - 将矩阵转为二维数组

例子

> SELECT matrix_array(array_onehot(array(1,2),4)) AS ma;
[ [ 0, 1, 0, 0 ], [ 0, 0, 1, 0 ] ]

matrix_dense

matrix_dense(array(a1, a2, ..., an)) - 生成一个紧凑矩阵

例子

> SELECT matrix_dense(array(array(1.0, 2.0, 3.0), array(2.0, 3.0, 4.0))) AS md;
{ "type": 1, "numRows": 2, "numCols": 3, "values": [ 1, 2, 2, 3, 3, 4 ], "isTransposed": false }

matrix_sum

matrix_sum(matrix) - 对数组的列值进行求和

例子

> SELECT matrix_sum(matrix_dense(array(array(1.0, 2.0, 3.0), array(2.0, 3.0, 4.0))), 0) AS ms;
{ "type": 1, "values": [ 3, 5, 7 ] }

vec_argmax

vec_argmax(vector) - 找到向量里面最大值所在的位置(下标从 0 开始)

例子

> SELECT vec_argmax(vec_dense(array(1.0,2.0,7.0))) AS index;
2

vec_dense

vec(array) - 生成一个紧凑向量

例子

> SELECT vec_dense(array(1.0,2.0,7.0)) as vec;
{ "type": 1, "values": [ 1, 2, 7 ] }

vec_sparse

vec_sparse(size , map(k1,v1,k2,v2)) - 生成一个稀疏向量

参数

  • size:向量的长度
  • map:稀疏向量的下标以及数值,注意下标从 0 开始

例子

> SELECT vec_sparse(3, map(1,2,2,4)) AS vs;
{ "type": 0, "size": 3, "indices": [ 1, 2 ], "values": [ 2, 4 ] }

vec_concat

vec_concat(array(v1,v2, ..., vn)) - 拼接多个向量成为一个向量

例子

> SELECT vec_concat(array(vec_dense(array(1.0,2.0)),vec_dense(array(3.0,4.0)))) AS vc;
{ "type": 1, "values": [ 1, 2, 3, 4 ] }

vec_cosine

vec_cosine(v1, v2) - 计算 consine 向量夹角

例子

> SELECT vec_cosine(vec_dense(array(1.0,2.0)),vec_dense(array(1.0,1.0))) AS vc;
0.9486832980505138

vec_slice

vec_slice(vector, indices) - 根据下标获取子 vector

例子

> SELECT vec_slice(vec_dense(array(1.0,2.0,3.0,4.0)),array(0,1,2)) AS vs;
{ "type": 1, "values": [ 1, 2, 3 ] }

vec_array

vec_array(vector) - 将向量转化为数组

例子

> SELECT vec_array(vec_dense(array(1.0,2.0))) as va;
[ 1, 2 ]

vec_mk_string

vec_mk_string(splitter, vector) - 使用 splitter 拼接向量,并返回字符串

例子

> SELECT vec_mk_string("*",vec_dense(array(1.0,2.0))) AS vms;
1.0*2.0

vec_wise_mul

vec_wise_mul(v1, v2) - 计算向量 v1, v2 对应矢量值的乘积,返回结果向量

例子

> SELECT vec_dense(array(2.5,2.0,1.0)) AS v1, vec_dense(array(3.0,2.0,1.0)) AS v2 AS data1;
> SELECT vec_wise_mul(v1, v2) AS vwm FROM data1 AS data2;
{ "type": 1, "values": [ 7.5, 4, 1 ] }

vec_wise_add

vec_wise_add(v1, v2) - 计算向量 v1, v2 对应矢量值的和,返回结果向量

例子

> SELECT vec_dense(array(2.5,2.0,1.0)) AS v1, vec_dense(array(3.0,2.0,1.0)) AS v2 AS data1;
> SELECT vec_wise_add(v1, v2) AS vwm FROM data1 AS data2;
{ "type": 1, "values": [ 5.5, 4, 2 ] }

vec_wise_dif

vec_wise_dif(v1, v2) - 计算向量 v1, v2 对应矢量值的差,返回结果向量

例子

> SELECT vec_dense(array(2.5,3.0,1.0)) AS v1, vec_dense(array(3.0,2.0,1.0)) AS v2 AS data1;
> SELECT vec_wise_dif(v1, v2) AS vwm FROM data1 AS data2;
{ "type": 1, "values": [ -0.5, 1, 0 ] }

vec_wise_mod

vec_wise_mod(v1, v2) - 向量 v1 的矢量值对 v2 的矢量值取模

例子

> SELECT vec_dense(array(11,7,3)) AS v1, vec_dense(array(2,3,4)) AS v2 AS data1;
> SELECT vec_wise_mod(v1, v2) AS vwm FROM data1 AS data2;
{ "type": 1, "values": [ 1, 1, 3 ] }

vec_inplace_add

vec_inplace_add(vector, addend) - vector 每个矢量值加上 addend,返回结果向量

例子

> SELECT vec_dense(array(2.5, 2.0, 1.0)) AS vd AS data1;
> SELECT vec_inplace_add(vd, 4.4) AS via FROM data1 AS data2;
{ "type": 1, "values": [ 6.9, 6.4, 5.4 ] }

vec_inplace_ew_mul

vec_inplace_ew_mul(vector, multiplier) - vector 每个矢量值乘 multiplier,返回结果向量

例子

> SELECT vec_dense(array(2.5, 2.0, 1.0)) AS vd AS data1;
> SELECT vec_inplace_ew_mul(vd, 4.4) AS niem FROM data1 AS data2;
{ "type": 1, "values": [ 11, 8.8, 4.4 ] }

vec_ceil

vec_ceil(vector) - 将 vector 矢量值向上取整

例子

> SELECT vec_dense(array(2.5, 2.4, 1.6)) AS vd AS data1;
> SELECT vec_ceil(vd) AS vc FROM data1 AS data2;
{ "type": 1, "values": [ 3, 3, 2 ] }

vec_floor

vec_floor(vector) - 将 vector 矢量值向上取整

例子

> SELECT vec_dense(array(2.5, 2.4, 1.6)) AS vd AS data1;
> SELECT vec_floor(vd) AS vc FROM data1 AS data2;
{ "type": 1, "values": [ 2, 2, 1 ] }

vec_mean

vec_mean(vector) - 获取向量矢量值的平均值

例子

> SELECT vec_mean(vec_dense(array(1.0,2.0,7.0,2.0))) AS vm;
3

vec_stddev

vec_stddev(vector) - 获取向量标准差

例子

> SELECT vec_stddev(vec_dense(array(3.0, 4.0, 5.0))) AS vs;
1

ngram

ngram(array, size) - 以 size 为窗口大小,返回滑动窗口的序列

例子

> SELECT ngram(array("a","b","c","d","e"),3) AS ngr;
[ "a b c", "b c d", "c d e" ]

keepChinese

keepChinese(str, keepPunctuation, include) - 对文本字段做处理,只保留中文字符

参数

  • str:待处理字符串
  • keepPunctuation:是否保留标点符号 true/false
  • include:指定保留字符,保留字符会出现在结果集中

例子

> SET query = "你◣◢︼【】┅┇☽☾✚〓▂▃▄▅▆▇█▉▊▋▌▍▎▏↔↕☽☾の·▸◂▴▾┈┊好◣◢︼【】┅┇☽☾✚〓▂▃▄▅▆▇█▉▊▋▌▍▎▏↔↕☽☾の·▸◂▴▾┈┊啊,..。,!?katty";
> SELECT keepChinese("${query}",false,array()) AS ch;
结果: 你好啊

sleep

sleep() - 休眠函数,单位为ms,无返回

例子

> SELETC sleep(1000) AS s1;

uuid

uuid() - 返回一个唯一的字符串,去掉了"-"

例子

> SELECT uuid() AS u1;
b4fd697ce5dc4ff48694a5e2e1804d81

 

动态创建 UDF/UDAF

Byzer 支持使用 Python、Java、Scala 编写UDF/UDAF。 无需打包或重启,只需运行注册 UDF 的 Byzer 代码,就可以即时生效。 极大的方便用户扩展 Byzer 的功能。

UDF注册

Byzer 提供 register 语法注册 UDF。你可以用以下两种方式使用它。

方法一

先将脚本注册为虚拟表,再将表注册为UDF。

下面是一个使用 scala 语言编写 UDF 并注册的例子:

-- script
> SET plusFun='''
def apply(a:Double,b:Double)={
   a + b
}
''';

-- register as a table
> LOAD script.`plusFun` AS scriptTable;

-- register as UDF
> REGISTER ScriptUDF.`scriptTable` AS plusFun OPTIONS lang = "scala";

方法二

Byzer 支持在一个语句中完成 UDF 的注册的所有步骤。

在这种方式中,我们必须手动指定脚本的编写语言,以及 UDF 的种类。文末有我们支持的语言以及 UDF 列表。

下面是一个使用 scala 语言编写并注册的例子。

> REGISTER ScriptUDF.`` AS plusFun WHERE
and lang="scala"
and udfType="udf"
and code='''
def apply(a:Double,b:Double)={
   a + b
}
''';

总结

适用范围

方法一方便做代码分割,UDF 申明可以放在单独文件,注册动作可以放在另外的文件,通过 include 来完成整合。

方法二相较于方法一更为简洁明了,适合数量较少的 UDF 注册。

参数设置

方法一使用 OPTIONS 关键字连接参数,方法二使用 WHERE 关键字连接参数。

目前支持的参数有:

  • lang: Scala/Java/Python
  • udfType: UDF/UDAF
  • code: UDF 代码
  • className: code中自定义类名(仅Java)
  • methodName: code中自定义函数名

UDF使用

无论使用哪种方式注册,你都可以开箱即用的使用注册过的 UDF。下面是一个使用上面注册过的 UDF 的例子。

> SELECT plusFun(1,2) AS sum;
3

支持的语言/UDF种类

  • Scala:UDF/UDAF
  • Java:UDF
  • Python:UDF

 

Python UDF

使用 Python 语言开发 UDF 时,需要在 register 语句中指定如下信息:

  • 指定 lang 为 Python
  • 指定 udfType 为 UDF

对于 Python UDF,特别说明以下几点:

  1. Byzer 支持 Python 版本为 2.7.1
  2. Python 不支持任何 native 库,比如 numpy.
  3. Python 必要使用 dataType 参数指定返回值的类型(例子1) 目前我们支持的 Python UDF 返回类型只能是如下类型或者他们的组合
    • string
    • float
    • double
    • integer
    • short
    • date
    • binary
    • map
    • array
  4. 为了弥补 Python UDF 的不足,Byzer 提供了专门的交互式 Python 语法以及大规模数据处理的 Python 语法。在 Python 专门章节 我们会提供更详细的介绍。

因此,我们建议对于 Python 尽可能只做简单的文本解析处理,以及使用原生自带的库。

例子

> REGISTER ScriptUDF.`` AS echoFun WHERE
and lang="python"
and dataType="map(string,string)"
and code='''
def apply(self,m):
    return m
 ''';

使用

> SELECT echoFun(map("a","b")) AS res;
{ "a": "b" }

 

Scala UDF

使用 Scala 语言开发 UDF 时,需要在 register 语句中指定如下信息:

  • 指定 lang 为 Scala
  • 指定 udfType 为 UDF

例子

> REGISTER ScriptUDF.`` AS plusFun WHERE
and lang="scala"
and udfType="udf"
and code='''
def apply(a:Double,b:Double)={
a + b
}
''';

使用

> SELECT plusFun(1,2) AS sum;
3

 

Scala UDAF

使用 Scala 语言开发 UDAF 时,需要在 register 语句中指定如下信息:

  • 指定 lang 为 Scala
  • 指定 udfType 为 UDAF

例子

> SET plusFun='''
import org.apache.spark.sql.expressions.{MutableAggregationBuffer, UserDefinedAggregateFunction}
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
class SumAggregation extends UserDefinedAggregateFunction with Serializable{
    def inputSchema: StructType = new StructType().add("a", LongType)
    def bufferSchema: StructType =  new StructType().add("total", LongType)
    def dataType: DataType = LongType
    def deterministic: Boolean = true
    def initialize(buffer: MutableAggregationBuffer): Unit = {
      buffer.update(0, 0l)
    }
    def update(buffer: MutableAggregationBuffer, input: Row): Unit = {
      val sum   = buffer.getLong(0)
      val newitem = input.getLong(0)
      buffer.update(0, sum + newitem)
    }
    def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = {
      buffer1.update(0, buffer1.getLong(0) + buffer2.getLong(0))
    }
    def evaluate(buffer: Row): Any = {
      buffer.getLong(0)
    }
}
''';

> LOAD script.`plusFun` AS scriptTable;

> REGISTER ScriptUDF.`scriptTable` AS plusFun options
className="SumAggregation"
and udfType="udaf";

使用

> SET data='''
{"a":1}
{"a":1}
{"a":1}
{"a":1}
''';
> LOAD jsonStr.`data` AS dataTable;

> SELECT a,plusFun(a) AS res FROM dataTable GROUP BY a AS output;
| a | res |
|===|=====|
| 1 |  4  |

 

Java UDF

使用 Java 语言开发 UDF 时,需要在 register 语句中指定如下信息:

  • 指定 lang 为 java
  • 指定 udfType 为 udf
  • 指定 className 为 UDF 类名

另外,还需要额外注意几点:

  • 传递的代码必须是一个 Java 类,系统默认会寻找 apply() 方法做为运行的 UDF
  • 需要指定 className/methodName 进行声明(如例子), 不指定类名将导致 Java UDF 无法编译。
  • 暂时不支持包名

例子

REGISTER ScriptUDF.`` AS echoFun WHERE 
and lang="java"
and udfType="udf"
and className="Test"
and methodName="test"
and code='''
import java.util.HashMap;
import java.util.Map;
public class Test {
    public Map<String, String> test(String s) {
      Map m = new HashMap<>();
      m.put(s, s);
      return m;
  }
}
''';

使用:

SET data='''{"a":"a"}''';
LOAD jsonStr.`data` AS dataTable;

SELECT echoFun(a) AS res FROM dataTable AS output;
Logo

更多推荐