0

I have an incredibly obvious question, but I cannot find a pandas solution (I found this, which is close).

My text file has no row delimiter. Thus, pandas reads the file as a data frame with a single row and a ton of columns. It looks as follows:

header_1, header_2, header_3, 1, jan, 600, 2, feb, 900, 3, jan, 678

I want to read three elements per row. The end result should look as follows:

header_1  header_2  header_3
       1       jan       600
       2       feb       900
       3       jan       678

How can I tell the file to read every third comma as a line break?

Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76

4 Answers4

1

Maybe standard str.split() will be enough:

txt = '''header_1, header_2, header_3, 1, jan, 600, 2, feb, 900, 3, jan, 678'''

txt = txt.split(',')
df = pd.DataFrame([*zip(txt[3::3], txt[4::3], txt[5::3])], columns=[*txt[0:3]])

print(df)

Prints:

  header_1  header_2  header_3
0        1       jan       600
1        2       feb       900
2        3       jan       678
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
1

Use **Split function** followed by slicing to create a dataframe

a="header_1, header_2, header_3, 1, jan, 600, 2, feb, 900, 3, jan, 678"
a=a.split(",")
a1=[]
a2=[]
a3=[]
for i in range(0,len(a)):
    if(i%3==2):
        a1.append(a[i])
    elif(i%3==1):
        a2.append(a[i])
    else:
        a3.append(a[i])
    data={a3[0]:a3[1:],a2[0]:a2[1:],a1[0]:a1[1:]}
df=pd.DataFrame(data)
    

Output

    header_1       header_2           header_3
0   1              jan                    600
1   2              feb                    900
2   3              jan                    678
Zesty Dragon
  • 551
  • 3
  • 18
1

You don't need pandas to do this, you could just use native python. If the file is nicely formatted, i.e. as you've written

header_1, header_2, header_3, 1, jan, etc

You can just use string methods such as split and reorganize the list into the 2D array shape you want. From there you can create your dataframe if you want to use pandas later.

xzkxyz
  • 527
  • 3
  • 4
1

Assumption is that the first couple of entries are the column names. You could split the text, find the number of headers, and partition the rest of the data to match the number of headers :

data = """header_1, header_2, header_3, 1, jan, 600, 2, feb, 900, 3, jan, 678"""

header = [head.strip() for head in data.split(',') if head.strip().startswith('header')]
rest = [head.strip() for head in data.split(',') if not head.strip().startswith('header')]
length = len(header)
rest = [rest[n:n+length] for n in range(0, len(rest), length)]
pd.DataFrame(rest, columns = header)



 header_1   header_2    header_3
0   1         jan         600
1   2         feb         900
2   3         jan         678
sammywemmy
  • 27,093
  • 4
  • 17
  • 31