A platform for collaborative data science at scale authored by Cloudera: https://www.cloudera.com/products/data-science-and-engineering/data-science-workbench.html
Questions tagged [cdsw]
16 questions
2
votes
1 answer
pyspark read format jdbc generates ORA-00903: invalid table name Error
With a pysqpark running on a remote server, I am able to connect to an Oracle database on another server with jdbc, but any valid query I run returns a ORA-00903: invalid table name Error.
I am able to connect to the database from my local machine…

Col Bates - collynomial
- 643
- 5
- 29
1
vote
0 answers
How to use DatabaseConnector connect with Hive in R in CDSW
I am trying to connect to Hive using the DatabaseConnector but unable to do so in R within CDSW. Can anyone please suggest how to accomplish this?
Please note that when using the driver and url, I am able to connect with hive and query the same…

Fierymech
- 11
- 3
1
vote
1 answer
Installing python 3.9 on Cloudera CDSW without sudo
I am trying to install Python 3.9 on Linux 4.4 in Cloudera Data Science Workbench (cdsw).. I do not have sudo rights and I wont be able to connect to any websites.
The current version of python is 3.6
Following the procedure as mentioned…

Amy Jack
- 55
- 10
1
vote
0 answers
Cloudera Workbench string enoding problem
I am pulling changes from a git repo where my coworker pushed R codes from his local windows
word <- gsub("=gesellschaftmitbeschränkterhaftung=","",fixed = T,x = word)
The code contains weird letters, such as "German Umlaute", e.g., "ä" in the…

safex
- 2,398
- 17
- 40
1
vote
0 answers
How to pass structtype in a csv file
I have around 300 variables and I am trying to pass customschema via csv.
Below is the sample code which I am using.
However on uploading the schema via csv files...The output doesnt contain columns list:
Output :…

Amy Jack
- 55
- 10
1
vote
1 answer
relative imports on CDSW
I have a project on CDSW organized as follow :
/home/cdsw/my_project_v2.1
|_>input
|_>output
|_>scr
|_>__init__.py
|_>main.py
|_>utils
|_>__init__.py
|_>helpers.py
in my current code, I use…

Steven
- 14,048
- 6
- 38
- 73
1
vote
1 answer
Get status job in cdsw
I have some R and python scripts in CDSW "Cloudera-Data-Science-Workbench". I create a shell script to run this with curl -v -XPOST.
How to get the status of a job from the API CDSW?

Zied Hermi
- 229
- 1
- 2
- 11
0
votes
1 answer
Deploy a flask app in using Cloudera Application
I have been using the following python 3 script in a CDSW session which run just fine as long as the session is not killed.
I am able to click on the top-right grid and select my app
hello.py
from flask import Flask
import os
app =…

legends1337
- 69
- 1
- 7
0
votes
0 answers
Pyspark not creating SparkContext (Yarn). bad gateway or network traffic blocked?
Here is some context of my installation of pyspark binary.
In my company, we use a Cloudera Data Science Workbench (CDSW). When we create a session for a new projet, I'm guessing it's a image from a specific Dockerfile. And inside this dockerfile is…

BeGreen
- 765
- 1
- 13
- 39
0
votes
2 answers
Writing dictionary addition function in R
I need to write the equivalent of the following code in R but I'm not quite sure how to go about it:
def add(args):
result = args["a"] + args["b"]
return result
The reason why is because for the platform I am using (Cloudera Data Science…

M00N KNIGHT
- 137
- 1
- 11
0
votes
1 answer
ERROR: You must give at least one requirement to install - CDSW
I am trying to install packages on my cdsw environment.
I have placed the packages in my cd /home/ folder
and I am running below command:
pip install --no-index…

Amy Jack
- 55
- 10
0
votes
0 answers
f-string results in error with line breaks on CDSW/linux
I have a strange issue in python (3.6.1):
a = 3
f"""a= {a}""" # works
But this does not on the Cloudera Data Science Workbench (a unix system):
f"""a=
{a}""" # error
Engine, line 1
"
^
SyntaxError: EOL while scanning string literal
On Windows…

safex
- 2,398
- 17
- 40
0
votes
0 answers
converting dataframe to csv throws error pyspark
I have huge dataframe around 7GB records.
I am trying to get the count of the dataframe and download it as csv
Both of them result in below error.
is there any other way of downloading the dataframe without multiple…

Eden T
- 57
- 1
- 8
0
votes
2 answers
Object not callable error | Where function
I am trying to run below query:
df3 = df1.join(df2, df1["DID"] == df2["JID"],'inner')\
.select(df1["DID"],df1["amt"]-df2["amt"]\
.where(df1["DID"]== "BIG123")).show()
I get error as shown below:
TypeError: 'Column' object is…

Nick Ryan
- 19
- 5
0
votes
1 answer
RJDBC hive, connect failed
I followed multiples tutorials to try to connect to Hive with RJDBC, without sucess.
Here is what I have:
library(DBI)
library(rJava)
library(RJDBC)
driver <- JDBC('org.apache.hive.jdbc.HiveDriver',
classPath =…

BeGreen
- 765
- 1
- 13
- 39