Questions tagged [vaex]

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas)

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

181 questions
12
votes
1 answer

Arrow IPC vs Feather

What is the difference between Arrow IPC and Feather? The official Arrow documentation says: Version 2 (V2), the default version, which is exactly represented as the Arrow IPC file format on disk. V2 files support storing all Arrow data types as…
tsorn
  • 3,365
  • 1
  • 29
  • 48
8
votes
2 answers

Import vaex error: PydanticImportError: `BaseSettings` has been moved to the `pydantic-settings` package

I am using Sagemaker notebook and when importing vaex, I am getting the below error. the version of vaex I'm using is 4.16.0 PydanticImportError: BaseSettings has been moved to the pydantic-settings package. See…
Kailash M S
  • 81
  • 1
  • 2
8
votes
4 answers

How to quickly compare two text files and get unique rows?

I have 2 text files (*.txt) that contain unique strings in the format: udtvbacfbbxfdffzpwsqzxyznecbqxgebuudzgzn:refmfxaawuuilznjrxuogrjqhlmhslkmprdxbascpoxda ltswbjfsnejkaxyzwyjyfggjynndwkivegqdarjg:qyktyzugbgclpovyvmgtkihxqisuawesmcvsjzukcbrzi The…
8
votes
2 answers

Drop duplicate rows in python vaex

I am working with python vaex, and I don't know how I can drop duplicate rows in a dataframe. For example in pandas there exists the method drop_duplicates(). Does there exist any similar function in vaex?
rootware
  • 81
  • 1
  • 3
5
votes
1 answer

What is Vaex function to parse string to datetime64, which equivalent to pandas to_datetime, that allow custom format?

I have date as string (example: 3/24/2020) that I would like to convert to datetime64[ns] format df2['date'] = pd.to_datetime(df1["str_date"], format='%m/%d/%Y') Use pandas to_datetime on vaex dataframe will result an error: ValueError: time data…
Haha TTpro
  • 5,137
  • 6
  • 45
  • 71
5
votes
1 answer

python vaex groupby with custom function

Is there a way to apply a custom function to a group using the groupby function of a vaex DataFrameArray? I can do: df_vaex.groupby(['col_x1','col_x2','col_x3','col_x4'], agg=vaex.agg.mean(df_vaex['col_y'])) But is there a way to do pandas:…
hatRat
  • 55
  • 4
5
votes
2 answers

error: command 'cmake' failed: No such file or directory

Getting error while installing vaex in Pycharm with Python3.8 I have installed below before running this on my Win-10 64-bit: - cmake v3.15.3 - pep517 v0.8.1 - pip v19.3.1 Error logs: running build_ext creating build\temp.win-amd64-3.8 creating…
user1222006
  • 159
  • 1
  • 3
  • 11
5
votes
0 answers

Time series decimation benchmark: Dask vs Vaex

I currently use Vaex to generate binned data for histograms and to decimate big time-series data. Essentially I reduce millions of time series points into a number of bins and compute the mean & max & min for each bin. I would like to compare Vaex…
DougR
  • 3,196
  • 1
  • 28
  • 29
4
votes
2 answers

Vaex: How to add/append rows to a Vaem DataFrame

How do you add data to a Vaex DataFrame? I can see there is add_column(), but no add/append_row() I'm looking to use Vaex instead of Pandas.
sten
  • 7,028
  • 9
  • 41
  • 63
4
votes
2 answers

How to drop duplicates in Vaex?

I have some entries from users and how many interactions this user had on my website... I have 340k rows and 70+ columns, and I want to use Vaex, but I'm having problems to do simple things like to drop duplicates. Could someone help me on how to do…
Leonardo Ferreira
  • 673
  • 1
  • 6
  • 22
4
votes
1 answer

Convert a column in vaex dataframe from String to Float or int

I tried theis solution But it didn't really solve my problem x = ['a', 'b', 'c', 'd', 'e', 'f'] y = np.array(['10', '20', '30', '40', '50', '60']) z = np.array(['x', 'y', 'z', 'f', 'b', 's']) df_vaex = vaex.from_arrays(x=x, y=y, z=z) df_vaex.y =…
salwaen
  • 81
  • 1
  • 6
4
votes
1 answer

Python Vaex data type conversion: string to datetime

I'm utilizing the Vaex library in Python for a project; I'm still very new to Vaex so I apologize if this is elementary. I'm having an issue with a data type conversion. One of my columns 'Paid_at' has a datatype of str, and it should be a…
mqn
  • 53
  • 2
  • 5
4
votes
1 answer

Convert large hdf5 dataset written via pandas/pytables to vaex

I have a very large dataset I write to hdf5 in chunks via append like so: with pd.HDFStore(self.train_store_path) as train_store: for filepath in tqdm(filepaths): with open(filepath, 'rb') as file: frame =…
sobek
  • 1,386
  • 10
  • 28
3
votes
0 answers

How to Delete a Vaex df from memory?

I am not able to find out how to delete all the reduntant vaex df from memory. For a pandas dataframe I am using: def delete_df(): dflist=[var for var in dir() if isinstance(eval(var),pd.core.frame.DataFrame)] del [[dflist]] …
Gamerarms
  • 111
  • 4
3
votes
2 answers

Virtual column with calculation in Vaex

I want to set a virtual column to a calculation using another column in Vaex. I need to use an if statement inside this calculation. In general I want to call df['calculation_col'] = log(df['original_col']) if df['original_col'] == 0 else -4 I then…
afriedman111
  • 1,925
  • 4
  • 25
  • 42
1
2 3
12 13