Questions tagged [dataset]

A dataset is a collection of data, generally represented in tabular form, with columns signifying different variables and rows signify different members of the set. If you are looking for a freely available dataset for any purpose, please consider asking your question on https://opendata.stackexchange.com.

11414 questions
575
votes
5 answers

A simple explanation of Naive Bayes Classification

I am finding it hard to understand the process of Naive Bayes, and I was wondering if someone could explain it with a simple step by step process in English. I understand it takes comparisons by times occurred as a probability, but I have no idea…
Aeonitis
  • 5,887
  • 3
  • 14
  • 8
216
votes
12 answers

Should I Dispose() DataSet and DataTable?

DataSet and DataTable both implement IDisposable, so, by conventional best practices, I should call their Dispose() methods. However, from what I've read so far, DataSet and DataTable don't actually have any unmanaged resources, so Dispose() doesn't…
mbeckish
  • 10,485
  • 5
  • 30
  • 55
168
votes
28 answers

How to convert a Scikit-learn dataset to a Pandas dataset

How do I convert data from a Scikit-learn Bunch object to a Pandas DataFrame? from sklearn.datasets import load_iris import pandas as pd data = load_iris() print(type(data)) data1 = pd. # Is there a Pandas method to accomplish this?
SANBI samples
  • 2,058
  • 2
  • 14
  • 20
142
votes
7 answers

Datatable vs Dataset

I currently use a DataTable to get results from a database which I can use in my code. However, many example on the web show using a DataSet instead and accessing the table(s) through the collections method. Is there any advantage, performance wise…
GateKiller
  • 74,180
  • 73
  • 171
  • 204
141
votes
5 answers

Sample datasets in Pandas

When using R it's handy to load "practice" datasets using data(iris) or data(mtcars) Is there something similar for Pandas? I know I can load using any other method, just curious if there's anything builtin.
canyon289
  • 3,355
  • 4
  • 33
  • 41
117
votes
11 answers

Sort columns of a dataframe by column name

This is possibly a simple question, but I do not know how to order columns alphabetically. test = data.frame(C = c(0, 2, 4, 7, 8), A = c(4, 2, 4, 7, 8), B = c(1, 3, 8, 3, 2)) # C A B # 1 0 4 1 # 2 2 2 3 # 3 4 4 8 # 4 7 7 3 # 5 8 8 2 I like to…
John Clark
  • 2,639
  • 5
  • 19
  • 13
108
votes
3 answers

What is the difference between "LINQ to Entities", "LINQ to SQL" and "LINQ to Dataset"

I've been working for quite a while now with LINQ. However, it remains a bit of a mystery what the real differences are between the mentioned flavours of LINQ. The successful answer will contain a short differentiation between them. What is the…
Marcel
  • 15,039
  • 20
  • 92
  • 150
99
votes
6 answers

How to delete the first row of a dataframe in R?

I have a dataset with 11 columns with over a 1000 rows each. The columns were labeled V1, V2, V11, etc.. I replaced the names with something more useful to me using the "c" command. I didn't realize that row 1 also contained labels for each column…
akz
  • 1,865
  • 2
  • 16
  • 13
96
votes
4 answers

What does batch, repeat, and shuffle do with TensorFlow Dataset?

I'm currently learning TensorFlow but I came across a confusion in the below code snippet: dataset = dataset.shuffle(buffer_size = 10 * batch_size) dataset = dataset.repeat(num_epochs).batch(batch_size) return…
blue
  • 1,695
  • 3
  • 10
  • 17
95
votes
2 answers

How to check if two data frames are equal

Say I have large datasets in R and I just want to know whether two of them they are the same. I use this often when I'm experimenting different algorithms to achieve the same result. For example, say we have the following datasets: df1 <-…
Waldir Leoncio
  • 10,853
  • 19
  • 77
  • 107
91
votes
6 answers

Data Augmentation in PyTorch

I am a little bit confused about the data augmentation performed in PyTorch. Now, as far as I know, when we are performing data augmentation, we are KEEPING our original dataset, and then adding other versions of it (Flipping, Cropping...etc). But…
Fawaz
  • 1,253
  • 2
  • 11
  • 9
91
votes
7 answers

How I can filter a Datatable?

I use a DataTable with Information about Users and I want search a user or a list of users in this DataTable. I try it butit don't work :( Here is my c# code: public DataTable GetEntriesBySearch(string username,string location,DataTable table) …
Tarasov
  • 3,625
  • 19
  • 68
  • 128
90
votes
4 answers

How to view a DataTable while debugging

I'm just getting started using ADO.NET and DataSets and DataTables. One problem I'm having is it seems pretty hard to tell what values are in the data table when trying to debug. What are some of the easiest ways of quickly seeing what values have…
Eric Anastas
  • 21,675
  • 38
  • 142
  • 236
81
votes
3 answers

Pillow in Python won't let me open image ("exceeds limit")

Just having some problems running a simulation on some weather data in Python. The data was supplied in a .tif format, so I used the following code to try to open the image to extract the data into a numpy array. from PIL import Image im =…
Tom Heeley
  • 978
  • 1
  • 7
  • 8
76
votes
5 answers

Select method in List Collection

I have an asp.net application, and now I am using datasets for data manipulation. I recently started to convert this dataset to a List collection. But, in some places it doesn't work. One is that in my old version I am using datarow[] drow = …
MAC
  • 6,277
  • 19
  • 66
  • 111
1
2 3
99 100