Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

A data frame is a tabular data structure. Usually, it contains data where rows are observations and columns are variables of various types. While data frame or dataframe is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), table is the term used in MATLAB and SQL.

The sections below correspond to each language that uses this term and are aimed at the level of an audience only familiar with the given language.

`data.frame` in R

Data frames (object class data.frame) are one of the basic tabular data structures in the R language, alongside matrices. Unlike matrices, each column can be a different data type. In terms of implementation, a data frame is a list of equal-length column vectors.

Type ?data.frame for help constructing a data frame. An example:

data.frame(
  x = letters[1:5], 
  y = 1:5, 
  z = (1:5) > 3
)
#   x y     z
# 1 a 1 FALSE
# 2 b 2 FALSE
# 3 c 3 FALSE
# 4 d 4  TRUE
# 5 e 5  TRUE

Related functions include is.data.frame, which tests whether an object is a data.frame; and as.data.frame, which coerces many other data structures to data.frame (through S3 dispatch, see ?S3). base r data.frames have been extended or modified to create new data structures by several R packages, including data.table and tibble. For further reading, see the paragraph on Data frames in the CRAN manual Intro to R

DataFrame in Python's pandas library

The pandas library in Python is the canonical tabular data framework on the SciPy stack, and the DataFrame is its two-dimensional data object. It is basically a rectangular array like a 2D numpy ndarray, but with associated indices on each axis which can be used for alignment. As in R, from an implementation perspective, columns are somewhat prioritized over rows: the DataFrame resembles a dictionary with column names as keys and Series (pandas' one-dimensional data structure) as values. The DataFrame object in pandas.

After importing numpy and pandas under the usual aliases (import numpy as np, import pandas as pd), we can construct a DataFrame in several ways, such as passing a dictionary of column names and values:

>>> pd.DataFrame({"x": list("abcde"), "y": range(1,6), "z": np.arange(1,6) > 3})
   x  y      z
0  a  1  False
1  b  2  False
2  c  3  False
3  d  4   True
4  e  5   True

DataFrame in Apache Spark

A Spark DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs. (source)

DataFrame in Maple

A DataFrame is one of the basic data structures in Maple. Data frames are a list of variables, known as DataSeries, which are displayed in a rectangular grid. Every column (variable) in a DataFrame has the same length, however, each variable can have a different type, such as integer, float, string, name, boolean, etc.

When printed, Data frames resemble matrices in that they are viewed as a rectangular grid, but a key difference is that the first row corresponds to the column (variable) names, and the first column corresponds to the row (individual) names. These row and columns are treated as header meta-information and are not a part of the data. Moreover, the data stored in a DataFrame can be accessed using these header names, as well as by the standard numbered index. For more details, see the Guide to DataFrames in the online Maple Programming Help.

143674 questions

3945

votes

33 answers

How to iterate over rows in a DataFrame in Pandas

I have a pandas dataframe, df: c1 c2 0 10 100 1 11 110 2 12 120 How do I iterate over the rows of this dataframe? For every row, I want to access its elements (values in cells) by the name of the columns. For example: for row in…

python pandas dataframe loops

asked May 10 '13 at 07:04

Roman

124,451
167
349
456

3394

votes

17 answers

How do I select rows from a DataFrame based on column values?

How can I select rows from a DataFrame based on values in some column in Pandas? In SQL, I would use: SELECT * FROM table WHERE column_name = some_value

python pandas dataframe

asked Jun 12 '13 at 17:42

szli

36,893
11
32
40

2886

votes

36 answers

Renaming column names in Pandas

I want to change the column labels of a Pandas DataFrame from ['$a', '$b', '$c', '$d', '$e'] to ['a', 'b', 'c', 'd', 'e']

python pandas replace dataframe rename

asked Jul 05 '12 at 14:21

user1504276

28,955
3
15
7

2168

votes

22 answers

Delete a column from a Pandas DataFrame

To delete a column in a DataFrame, I can successfully use: del df['column_name'] But why can't I use the following? del df.column_name Since it is possible to access the Series via df.column_name, I expected this to work.

python pandas dataframe

asked Nov 16 '12 at 06:26

John

41,131
31
82
106

1829

votes

19 answers

How do I get the row count of a Pandas DataFrame?

How do I get the number of rows of a pandas dataframe df?

python pandas dataframe

asked Apr 11 '13 at 08:14

yemu

26,249
10
32
29

1700

votes

24 answers

Selecting multiple columns in a Pandas dataframe

How do I select columns a and b from df, and save them into a new dataframe df1? index a b c 1 2 3 4 2 3 4 5 Unsuccessful attempt: df1 = df['a':'b'] df1 = df.ix[:, 'a':'b']

python pandas dataframe select indexing

asked Jul 01 '12 at 21:03

user1234440

22,521
18
61
103

1579

votes

41 answers

How to change the order of DataFrame columns?

I have the following DataFrame (df): import numpy as np import pandas as pd df = pd.DataFrame(np.random.rand(10, 5)) I add more column(s) by assignment: df['mean'] = df.mean(1) How can I move the column mean to the front, i.e. set it as first…

python pandas dataframe

asked Oct 30 '12 at 22:22

Timmie

15,995
3
14
7

1483

votes

19 answers

Sort (order) data frame rows by multiple columns

I want to sort a data frame by multiple columns. For example, with the data frame below I would like to sort by column 'z' (descending) then by column 'b' (ascending): dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"), levels = c("Low",…

r sorting dataframe r-faq

asked Aug 18 '09 at 21:33

Christopher DuBois

42,350
23
71
93

1475

votes

16 answers

Change column type in pandas

I created a DataFrame from a list of lists: table = [ ['a', '1.2', '4.2' ], ['b', '70', '0.03'], ['x', '5', '0' ], ] df = pd.DataFrame(table) How do I convert the columns to specific types? In this case, I want to convert…

python pandas dataframe types casting

asked Apr 08 '13 at 23:53

user1642513

1402

votes

15 answers

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

I have this DataFrame and want only the records whose EPS column is not NaN: >>> df STK_ID EPS cash STK_ID RPT_Date 601166 20111231 601166 NaN NaN 600036 20111231 600036 NaN 12 600016 20111231 600016 …

python pandas dataframe nan

asked Nov 16 '12 at 09:17

bigbug

55,954
42
77
96

1358

votes

32 answers

Create a Pandas Dataframe by appending one row at a time

How do I create an empty DataFrame, then add rows, one by one? I created an empty DataFrame: df = pd.DataFrame(columns=('lib', 'qty1', 'qty2')) Then I can add a new row at the end and fill a single field with: df = df._set_value(index=len(df),…

python pandas dataframe append

asked May 23 '12 at 08:12

PhE

15,656
4
23
21

1354

votes

21 answers

How to deal with SettingWithCopyWarning in Pandas

Background I just upgraded my Pandas from 0.11 to 0.13.0rc1. Now, the application is popping out many new warnings. One of them like this: E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a…

python pandas dataframe chained-assignment pandas-settingwithcopy-warning

asked Dec 17 '13 at 03:48

bigbug

55,954
42
77
96

1331

votes

24 answers

Get a list from Pandas DataFrame column headers

I want to get a list of the column headers from a Pandas DataFrame. The DataFrame will come from user input, so I won't know how many columns there will be or what they will be called. For example, if I'm given a DataFrame like this: y gdp …

python pandas dataframe list header

asked Oct 20 '13 at 21:18

natsuki_2002

24,239
21
46
50

1296

votes

8 answers

Use a list of values to select rows from a Pandas dataframe

Let’s say I have the following Pandas dataframe: df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]}) df A B 0 5 1 1 6 2 2 3 3 3 4 5 I can subset based on a specific value: x = df[df['A'] == 3] x A B 2 3 …

python pandas dataframe

asked Aug 23 '12 at 16:31

zach

29,475
16
67
88

1295

votes

32 answers

How to add a new column to an existing DataFrame?

I have the following indexed DataFrame with named columns and rows not- continuous numbers: a b c d 2 0.671399 0.101208 -0.181532 0.241273 3 0.446172 -0.243316 0.051767 1.577318 5 0.614758 0.075793 -0.451460…

python pandas dataframe chained-assignment

asked Sep 23 '12 at 19:00

tomasz74

16,031
10
37
51

2 3

…

99 100 Next

Questions tagged [dataframe]

data.frame in R

DataFrame in Python's pandas library

DataFrame in Apache Spark

DataFrame in Maple

`data.frame` in R