Thanh Le
Thanh Le's Blog

Thanh Le's Blog

Build a real-time dashboard using Decentralized platform - Streamr

Build a real-time dashboard using Decentralized platform - Streamr

Thanh Le's photo
Thanh Le
·Nov 29, 2021·

5 min read

Subscribe to my newsletter and never miss my upcoming articles

Original post from thanhle.blog/blog/build-realtime-dashboard-..

Why should you read this article?

  • A use-case of Streamr
  • Flow from (ELT) - extract, load, transform and visualize it
  • Though on real-time BI and decentralize

Demo

Image description

streamr-dex-chart.vercel.app

Story

Our team is working mainly on data and how to gain insight via data, when doing Cryptocurrency research, I found some problems:

  • There are lots of data our there, but it takes time to gather data private data
  • There is some BI tool out there but it depends on which data they provided (Glassnode, dune.xyz, Messari, IntoTheBlock,...)
  • It's not supported in real-time. I believed that in cryptocurrency or many other industries. If we can go a step ahead, we can take lots of benefits

You can say that, if we need to quick result on the result, we can use Glassnode or Dune.xyz but our team wants to build a tool that everyone can get what data they want, without any or basic technical knowledge and the dashboard must be in REAL-TIME.

Then we come up with the expected data flow

Image description

But this is a very starting point so we don't want to build anything from scratch, so we are going to build a really simple flow

Image description

FAQ

What is Prefect? Why is it needed here? https://www.prefect.io/ - The easiest way to build, run, and monitor data pipelines at scale.

After trying many ETL tools, we think that Prefect is a perfect fit, it clean, easy to write, and the community is really good, and we can also scale the infrastructure easily (They are building a serverless Agent in future)

Why not send data directly to Streamr? I want to but Streamr hasn't supported Python SDK for now and we don't have much time to build an SDK on Python. But I think python client is really important since most of the data tools are written in Python

What is Airbyte? Why is it needed here? Another open-source data ingestion tool, it is the fastest-growing data ingestion tool with supported many data source and destination Image description

We got funds by Streamr to #BUIDL

As you can see, if Streamr can connect to Airbyte, it opens a huge amount of data that can flow into it platform, from Google Analysis, HubSpot, Salesforce to MySQL, BigQuery, Postgres,...

With the Airbyte connectors, now you can publish tons of data into Streamr without any efforts

Image description

github.com/devmate-cloud/streamr-airbyte-co..

Which support from Streamr data-fund, our team can focus on building the hardest part on the pipeline and this integration also brings them the potential to acquire more users. This is a win-win relationship! You guys can apply data-fund to building awesome things with Streamr at streamr.network/fund

Which sponsor for the hardest part, we now can start building

github.com/devmate-cloud/streamr-dex-chart

github.com/thanhlmm/prefect_cmc

...on building real-time BI tool

This is a really hard journey but we started with a very first step, there are lots of problems we need to solve

Building a Crawling tool for anyone is hard

The internet is an open world that we can get many data there, but doing that with scale is hard. Most of the data is their website assets, so you end up building a crawling tool to get that data or buy from them via API. That why their business is, to make it hard to be crawl

Our team is thinking of building a SQL-like language to crawl data

The idea is that we can crawl most of the website data via SQL syntax

Eg:

SELECT a[href], img[alt] FROM "https://coinmarketcap.coin" WHERE a[class="cmc-link"];

SELECT div[innerText] FROM "https://streamr.network" WHERE div CHILD OF ".Hero__Inner";

Realtime BI tool

Its lack of a tool to visualize real-time data, most of them can do batch data by querying from data-warehouse database but supported in real-time is a hard problem.

{% twitter 1461719755555946500 %}

Kafka is the only tool that we can use to build a real-time data pipeline, and we only see https://metatron.app/ supported Kafka right now

Is Streamr fit?

No, at the moment. By far, Streamr is supported most of the basic cases: We can subscribe to real-time data, get past data by time,... but it doesn't support the most important cases:

  • Process the data. Imagine you have a stream input of DEX volume data by every 30 mins and we need a stream of DEX volume data by every 4 hours as output.

And we think Streamr can utilize its nodes to run lots of potential use-cases not just as an infrastructure to stream data. For eg:

  • Support smart contract so we can make a node as a crawler data point. By doing so, we can leverage lots of nodes on Streamr Network to get data.
  • Support smart contract so it can process stream like my above example. We got and stream as input → Smart contract → stream as output
  • Our team don't want to expose my Private key when integrating Streamr on the frontend, we can hide it by running the Streamr node on my server but we think it takes effort to do and operate our node

This is lots of works to do but our team believed, which the fast-growing industry, we can have those abilities in the near future

 
Share this