0

I want to read a txt.file with Pandas and the Problem is the seperator/delimiter consits of a number and Minimum two blanks afterwards.

I already tried it similiar to this code (How to make separator in pandas read_csv more flexible wrt whitespace?):

pd.read_csv("whitespace.txt", header=None, delimiter=r"\s+")

This is only working if there is only a blank or more. So I adjustet it to the following code.

delimiter=r"\d\s\s+"

But this is seperating my dataframe when it sees two blanks or more, but i strictly Need the number before it followed by at least two blanks, anyone has an idea how to fix it?

My data Looks as follows:

I am an example of a dataframe
I have Problems to get read
100,00
So How can I read it
20,00

so the first row should be: I am an example of a dataframe I have Problems to get read 100,00 followed by the second row: So HOw can I read it 20,00

PV8
  • 5,799
  • 7
  • 43
  • 87

1 Answers1

1

Id try it like this.

Id manipulate the text file before I attempt to parse it to a dataframe as follows:

import pandas as pd
import re

f = open("whitespace.txt", "r")
g = f.read().replace("\n", " ")

prepared_text = re.sub(r'(\d+,\d+)', r'\1@', g)

df = pd.DataFrame({'My columns':prepared_text.split('@')})
print(df)

This gives the following:

                                          My columns
0  I am an example of a dataframe I have Problems...
1                         So How can I read it 20,00
2 

I guess this'd suffice as long as the input file wasnt too large but using the re module and substitiution gives you the control you seek.

The (\d+,\d+) parentheses mark a group which we want to match. We're basically matching any of your numbers in your text file. Then we use the \1 which is called a backreference to the matched group which is referred to when specifying a replacement. So \d+,\d+ is replaced by \d+,\d+@.

Then we use the inserted character as a delimiter.

There are some good examples here:

https://lzone.de/examples/Python%20re.sub

Paula Livingstone
  • 1,175
  • 1
  • 12
  • 21
  • it is working out, could oyu explain what excatly the Syntax after re.Sub means? – PV8 May 23 '19 at 09:18