Byzer-python 介绍

csdndevpressbyzer

141人浏览 · 2023-12-11 12:23:33

csdndevpressbyzer · 2023-12-11 12:23:33 发布

Byzer通过 Byzer-python 扩展（内置）来支持Python 代码。

通过 Byzer-python，用户不仅仅可以进行

使用 Python 进行 ETL 处理，比如可以将一个 Byzer 表转化成一个分布式DataFrame on Dask 来操作，
支持各种机器学习框架，比如 Tensorflow，Sklearn，PyTorch。

用户的Python脚本在 Byzer中是黑盒，用户可以通过固定API获得表数据，通过固定API来将Python输出转化为表，方便后续SQL处理。

Hello World

-- Byzer-python Hello World
select "world" as hello as table1;

!python conf "schema=st(field(hello,string))";
!python conf "pythonExec=/home/winubuntu/miniconda3/envs/byzerllm-desktop/bin/python";
!python conf "dataMode=model";
!python conf "runIn=driver";

run command as Ray.`` where 
inputTable="table1"
and outputTable="new_table"
and code='''
import ray
from pyjava.api.mlsql import RayContext,PythonContext

ray_context = RayContext.connect(globals(),None)

rows_from_table1 = [item for item in ray_context.collect()]

for row in rows_from_table1:
   row["hello"] = "Byzer-Python"

context.build_result(rows_from_table1)
''';

select * from new_table as output;

简单描述下上面的代码。