Questions tagged [cdsw]

A platform for collaborative data science at scale authored by Cloudera: https://www.cloudera.com/products/data-science-and-engineering/data-science-workbench.html

16 questions
2
votes
1 answer

pyspark read format jdbc generates ORA-00903: invalid table name Error

With a pysqpark running on a remote server, I am able to connect to an Oracle database on another server with jdbc, but any valid query I run returns a ORA-00903: invalid table name Error. I am able to connect to the database from my local machine…
1
vote
0 answers

How to use DatabaseConnector connect with Hive in R in CDSW

I am trying to connect to Hive using the DatabaseConnector but unable to do so in R within CDSW. Can anyone please suggest how to accomplish this? Please note that when using the driver and url, I am able to connect with hive and query the same…
Fierymech
  • 11
  • 3
1
vote
1 answer

Installing python 3.9 on Cloudera CDSW without sudo

I am trying to install Python 3.9 on Linux 4.4 in Cloudera Data Science Workbench (cdsw).. I do not have sudo rights and I wont be able to connect to any websites. The current version of python is 3.6 Following the procedure as mentioned…
Amy Jack
  • 55
  • 10
1
vote
0 answers

Cloudera Workbench string enoding problem

I am pulling changes from a git repo where my coworker pushed R codes from his local windows word <- gsub("=gesellschaftmitbeschränkterhaftung=","",fixed = T,x = word) The code contains weird letters, such as "German Umlaute", e.g., "ä" in the…
safex
  • 2,398
  • 17
  • 40
1
vote
0 answers

How to pass structtype in a csv file

I have around 300 variables and I am trying to pass customschema via csv. Below is the sample code which I am using. However on uploading the schema via csv files...The output doesnt contain columns list: Output :…
Amy Jack
  • 55
  • 10
1
vote
1 answer

relative imports on CDSW

I have a project on CDSW organized as follow : /home/cdsw/my_project_v2.1 |_>input |_>output |_>scr |_>__init__.py |_>main.py |_>utils |_>__init__.py |_>helpers.py in my current code, I use…
Steven
  • 14,048
  • 6
  • 38
  • 73
1
vote
1 answer

Get status job in cdsw

I have some R and python scripts in CDSW "Cloudera-Data-Science-Workbench". I create a shell script to run this with curl -v -XPOST. How to get the status of a job from the API CDSW?
Zied Hermi
  • 229
  • 1
  • 2
  • 11
0
votes
1 answer

Deploy a flask app in using Cloudera Application

I have been using the following python 3 script in a CDSW session which run just fine as long as the session is not killed. I am able to click on the top-right grid and select my app hello.py from flask import Flask import os app =…
legends1337
  • 69
  • 1
  • 7
0
votes
0 answers

Pyspark not creating SparkContext (Yarn). bad gateway or network traffic blocked?

Here is some context of my installation of pyspark binary. In my company, we use a Cloudera Data Science Workbench (CDSW). When we create a session for a new projet, I'm guessing it's a image from a specific Dockerfile. And inside this dockerfile is…
BeGreen
  • 765
  • 1
  • 13
  • 39
0
votes
2 answers

Writing dictionary addition function in R

I need to write the equivalent of the following code in R but I'm not quite sure how to go about it: def add(args): result = args["a"] + args["b"] return result The reason why is because for the platform I am using (Cloudera Data Science…
M00N KNIGHT
  • 137
  • 1
  • 11
0
votes
1 answer

ERROR: You must give at least one requirement to install - CDSW

I am trying to install packages on my cdsw environment. I have placed the packages in my cd /home/ folder and I am running below command: pip install --no-index…
Amy Jack
  • 55
  • 10
0
votes
0 answers

f-string results in error with line breaks on CDSW/linux

I have a strange issue in python (3.6.1): a = 3 f"""a= {a}""" # works But this does not on the Cloudera Data Science Workbench (a unix system): f"""a= {a}""" # error Engine, line 1 " ^ SyntaxError: EOL while scanning string literal On Windows…
safex
  • 2,398
  • 17
  • 40
0
votes
0 answers

converting dataframe to csv throws error pyspark

I have huge dataframe around 7GB records. I am trying to get the count of the dataframe and download it as csv Both of them result in below error. is there any other way of downloading the dataframe without multiple…
Eden T
  • 57
  • 1
  • 8
0
votes
2 answers

Object not callable error | Where function

I am trying to run below query: df3 = df1.join(df2, df1["DID"] == df2["JID"],'inner')\ .select(df1["DID"],df1["amt"]-df2["amt"]\ .where(df1["DID"]== "BIG123")).show() I get error as shown below: TypeError: 'Column' object is…
Nick Ryan
  • 19
  • 5
0
votes
1 answer

RJDBC hive, connect failed

I followed multiples tutorials to try to connect to Hive with RJDBC, without sucess. Here is what I have: library(DBI) library(rJava) library(RJDBC) driver <- JDBC('org.apache.hive.jdbc.HiveDriver', classPath =…
BeGreen
  • 765
  • 1
  • 13
  • 39
1
2