1

In my code i have received result like this one:

A B C
1 1 1
A B C
2 2 2
A B C
3 3 3

I need to merge those columns (dataframes) to one big dataframe like

 A B C
 1 1 1
 2 2 2
 3 3 3

To merge dataframes from different files its ease like pd.merge(df1,df2) but how to do it when dataframes are in one file? Thanks in advice!

EDIT: to receive my data i converted the lines in my dataset to get dataframes, and i have received in one output each dataset for each line. my code:

def coordinates():
    with open('file.txt') as file:
        for lines in file:
            lines =StringIO(lines[35:61]) #i need only those fields in each line
            abc=pd.read_csv(lines,sep=' ',header=None)
            abc.columns=['A', 'B', 'C','D','E','F']
            print abc

coordinates()

EDIT2: Proposition from s_vishnu its only good for prapared file with same multiple headers. But in my case i have multiple DataFrames generated to the file and each line after header have 0 value. It's many dataframes and each have only one line.

EDIT3: in my file.txt i have big amount of lines with about 80 letters in line like this:

AAA S S SSDAS ASDJAI A 234 33 43 234 2342999 2.31 22 33 SSS S D W2UUQ Q231WQ A 222 11 23 123 1231299 2.31 22 11

and from those line i need only part of information so thats why i did lines =StringIO(lines[35:61]) to take this info. In this example i will need letters [30:55] and create dataframe with them withcolumns=['A', 'B', 'C','D','E','F'] with sep=' '

Pawe
  • 111
  • 1
  • 7
  • Maybe the answer to this one will help you : https://stackoverflow.com/questions/44715393/how-to-concatenate-multiple-pandas-dataframes-without-running-into-memoryerror – Tbaki Jun 23 '17 at 11:12

2 Answers2

0

my_test.csv:

A, B, C
1, 1 ,1
A, B, C
2, 2, 2
A, B, C
3, 3, 3

Use list slicing.

import pandas as pd
df = pd.read_csv("my_test.csv")
df=df[::2]
print(df)

output:

   A    B   C
0  1   1    1
2  2    2   2
4  3    3   3

df=df[::2] This is advanced list slicing. Where in df[::2] the 2 means starting from 0 increment by 2 step.

But note the index values. They too are in steps of 2. i.e 0,2,4,.. to change the index just do this.

import pandas as pd
df = pd.read_csv("my_test.csv")
df=df[::2]

df.index = range(len(df['A']))
print(df)

output:

   A    B   C
0  1   1    1
1  2    2   2
2  3    3   3

So you get the values you desire.

void
  • 2,571
  • 2
  • 20
  • 35
  • hello . My output its still same, its doesnt work in my case with my code. i am receiving still: ` A B C 0 1 1 1 A B C 0 2 2 2` and etc.. – Pawe Jun 23 '17 at 12:01
  • `In my code i have received result like this one` this is what you specified right? How did you get this? Apply my code after once you get what you mentioned it will work – void Jun 23 '17 at 12:09
  • please take a look on my code in my question. Its not a one frame with multiple headers, its multiple dataframes and each dataframe have one same header, maybe thats why it doesn't work on my dataset? – Pawe Jun 24 '17 at 12:33
  • Okay man will look. Just can you post what's in your `file.txt` – void Jun 24 '17 at 13:48
  • ,i have edit my question, i think i can do my code better at the beggining where i am taking letters from the lines, and if someone could help me to create only one dataframe and if that will be possible i will not need to merge those dataframes which i have right now. Thanks – Pawe Jun 26 '17 at 06:37
0

I have found the solution, I've changed the code at the beginning and that was helpfull:

def coordinates():
abc=open('file.txt')
lines=abc.readlines()
        for line in lines:
        abc2=line[20:-7] #i just cut the lines from the begining and from the end, and i dont need to take data from the middle
        abc3=abc2.split()
        pd.DataFrame(abc3) 
        print abc3

coordinates()
Pawe
  • 111
  • 1
  • 7