Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
We just released a new open source boilerplate template to help you (any Spark user) run spark-submit commands smoothlyâââsuch as inserting dependencies, project source code and more.
TLDR: Here is an open source template to help you get started
At Soluto, as part of Data Scientist day-to-day work, we create ETL (Extract, Transform, Load) jobs. Our main tool for this is Spark, specifically, PySpark, with spark-submit.
Spark is used for distributed computing on large-scale datasets. spark-submit helps you launch your code application on your cluster.
Here are some examples of jobs we run daily at Soluto:
- Creating offline content recommendations for users
- Aggregating single events into more logical tablesâââas part of our service we offer tech support via chat messaging. Instead of having multiple message events for a single support session, we create SessionsTable with one session entity that holds all the aggregated information of a single chat session
Some of the basic needs when using Spark for ETLÂ jobs:
- Passing arguments
- Creating Spark context and sql context
- Loading your project source code (src directory)
- Loading pip modules (with simple requirements file)
We created a ï simple template that can help you get started running ETL jobs using PySpark (both using spark-submit and interactive shell), create Spark context and sql context, use simple command line arguments and load all your dependencies (your project source code and third party requirements).
So if youâre starting a new Spark project, âForkâ it on GitHub and enjoy Sparking it up!
Please feel free to share any thoughts, open issues and contribute code!
Get A Quick Start With PySpark And Spark-Submit was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.