How to create a DataFrame from custom values

Question

I am reading in a text file, on each line there are multiple values. I am parsing them based on requirements using function parse.

def parse(line):
    ......
    ......
    return line[0],line[2],line[5]

I want to create a dataframe, with each line as a row and the three returened values as columns

df = pd.DataFrame()

with open('data.txt') as f:
    for line in f:
       df.append(line(parse(line)))

When I run the above code, I get all values as a single column. Is it possible to get it in proper tabular format.

Possible duplicate of [add one row in a pandas.DataFrame](https://stackoverflow.com/q/10715965/1278112) — Shihe Zhang, Oct 28 '17 at 02:17

juanpa.arrivillaga · Accepted Answer · 2017-10-27T18:13:43.660

4

You shouldn't .append to DataFrame in a loop, that is very inefficient anyway. Do something like:

colnames = ['col1','col2','col3'] # or whatever you want
with open('data.txt') as f:
    df = pd.DataFrame([parse(l) for l in f], columns=colnames)

Note, the fundamental problem is that pd.DataFrame.append expects another data-frame, and it appends the rows of that other data-frame. It interpretes a list as a bunch of single rows. So note, if you structure your list to have "rows" it would work as intended. But you shouldn't be using .append here anyway:

In [6]: df.append([1,2,3])
Out[6]:
   0
0  1
1  2
2  3

In [7]: df = pd.DataFrame()

In [8]: df.append([[1, 2, 3]])
Out[8]:
   0  1  2
0  1  2  3

edited Oct 27 '17 at 18:13

answered Oct 27 '17 at 18:02

juanpa.arrivillaga

88,713
10
131
172

Is there a way to rename column names ? – Oct 27 '17 at 18:13
1

@ankitbiradar yes, the easiest way is to pass the names to the constructor using `..., columns=['name1','name2','name3']` – juanpa.arrivillaga Oct 27 '17 at 18:14

score 0 · Answer 2 · answered Oct 27 '17 at 18:13

Uma forma rápida de fazer isso (TL;DR):

Creating the new column:

  `df['com_zeros'] = '0'`

Applying the condition::

for b in df.itertuples():
    df.com_zeros[b.Index] = '0'+str(b.battles) if b.battles<9 else str(b.battles)

Result:

df
     regiment company deaths  battles size com_zeros
0  Nighthawks     1st    kkk        5    l        05
1  Nighthawks     1st     52       42   ll        42
2  Nighthawks     2nd     25        2    l        02
3  Nighthawks     2nd    616        2    m        02

See the example by https://repl.it/JHW6.

Obs.: The example running on repl.it seems to hang, but that is not the case, the load of pandas on repl.it is always time consuming.

To suppress warnings on jupyter notebook:

import warnings
warnings.filterwarnings('ignore')

Also, this is definitely *not* *uma forma rápida de fazer isso*. You probably just want `df['com_zeros'] = df.battles.astype(str).str.zfill(2)` — juanpa.arrivillaga, Oct 27 '17 at 18:20

score 0 · Answer 3 · answered Oct 27 '17 at 18:15

0

In addition to @juanpa.arrilaga,

It seems that you do have a structured file and just need the 1st 3rd and 5th item in the file.

load it and use drop

df = pd.read_csv('file')

df.drop([columns],axis = 1)

answered Oct 27 '17 at 18:15

kaihami

815
7
18

How to create a DataFrame from custom values

3 Answers3