3

I am reading in a text file, on each line there are multiple values. I am parsing them based on requirements using function parse.

def parse(line):
    ......
    ......
    return line[0],line[2],line[5]

I want to create a dataframe, with each line as a row and the three returened values as columns

df = pd.DataFrame()

with open('data.txt') as f:
    for line in f:
       df.append(line(parse(line)))

When I run the above code, I get all values as a single column. Is it possible to get it in proper tabular format.

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172

3 Answers3

4

You shouldn't .append to DataFrame in a loop, that is very inefficient anyway. Do something like:

colnames = ['col1','col2','col3'] # or whatever you want
with open('data.txt') as f:
    df = pd.DataFrame([parse(l) for l in f], columns=colnames)

Note, the fundamental problem is that pd.DataFrame.append expects another data-frame, and it appends the rows of that other data-frame. It interpretes a list as a bunch of single rows. So note, if you structure your list to have "rows" it would work as intended. But you shouldn't be using .append here anyway:

In [6]: df.append([1,2,3])
Out[6]:
   0
0  1
1  2
2  3

In [7]: df = pd.DataFrame()

In [8]: df.append([[1, 2, 3]])
Out[8]:
   0  1  2
0  1  2  3
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
0

Uma forma rápida de fazer isso (TL;DR):

Creating the new column:

  `df['com_zeros'] = '0'`

Applying the condition::

for b in df.itertuples():
    df.com_zeros[b.Index] = '0'+str(b.battles) if b.battles<9 else str(b.battles)

Result:

df
     regiment company deaths  battles size com_zeros
0  Nighthawks     1st    kkk        5    l        05
1  Nighthawks     1st     52       42   ll        42
2  Nighthawks     2nd     25        2    l        02
3  Nighthawks     2nd    616        2    m        02

See the example by https://repl.it/JHW6.

Obs.: The example running on repl.it seems to hang, but that is not the case, the load of pandas on repl.it is always time consuming.

To suppress warnings on jupyter notebook:

import warnings
warnings.filterwarnings('ignore')
  • Also, this is definitely *not* *uma forma rápida de fazer isso*. You probably just want `df['com_zeros'] = df.battles.astype(str).str.zfill(2)` – juanpa.arrivillaga Oct 27 '17 at 18:20
0

In addition to @juanpa.arrilaga,

It seems that you do have a structured file and just need the 1st 3rd and 5th item in the file.

load it and use drop

df = pd.read_csv('file')

df.drop([columns],axis = 1)

kaihami
  • 815
  • 7
  • 18