0

I have a very large text file which I am trying to load into jupyternotebook to perform analysis and etc..

But I can't seem to find a way to separate the columns? Thus far I have only had experience in working with hdf5 and csv files which are relatively easy to get a hang of.

I will attach a link to the data below,

https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-022-04496-5/MediaObjects/41586_2022_4496_MOESM3_ESM.txt

df1 = pd.read_csv('41586_2022_4496_MOESM3_ESM.txt', delimiter='\t')
print(df1.head(2))

result

       1    331.581577     -1.512106  17.774   2.143  -0.828   0.132     104.93    1092.57      45.54     7.355     1.359    -1.468     267695571003410291                   20111024-F5902-01-061    26.9  5520.3    40.0    3.951    0.116    1.581    0.430    2.296    0.188    0.339    0.041
0       2    332.300352     -1.566708   6.780   0...                                                                                                                                                                                                                                              
1       3    331.985497     -1.371940  18.426   1...                                                                                                                                                                                                                                              

Thanks in advance :)

Constantin Hong
  • 701
  • 1
  • 2
  • 16
  • *How* to seperate the columns? Which columns? :ı – doneforaiur Jul 02 '23 at 18:14
  • I mean that once I load the data in, I'm expecting to see 26 distinguished columns: each corresponding to a particular parameter like age, age uncertainty etc... but all of it is bunched up in a single column. – OverflownOverflow Jul 02 '23 at 19:25
  • 1
    You propably didn't specift the delimiter. What's your delimiter in the `csv` file? You can specify one like this; `pd.read_csv('paths.txt', delimiter="|")`. Can't open your link since it's messed up. You should edit your question and state that there should be 26 columns. Large file, jupyter, etc. are irrelevant to your question. – doneforaiur Jul 02 '23 at 19:26
  • I get a ParserError when I try to specify a delimiter, and sorry! Here is the link that should work: – OverflownOverflow Jul 02 '23 at 20:47
  • https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-022-04496-5/MediaObjects/41586_2022_4496_MOESM3_ESM.txt – OverflownOverflow Jul 02 '23 at 20:47
  • There's a a bunch of commented lines. Delete them by hand and try with the `delimiter="\t"` since it seems there's a tab between the columns. – doneforaiur Jul 02 '23 at 20:49
  • 1
    Please include text as text, not images and especially not links to images. And please include a sample of the data format within the post, SO posts have to be self contained. – cafce25 Jul 02 '23 at 21:00
  • Ahh yes, I've tried that :( I edited the question and attached an image of what I get – OverflownOverflow Jul 02 '23 at 21:01
  • @OverflownOverflow next time you ask question, please post the code and the result as I edited in your question. – Constantin Hong Jul 02 '23 at 21:28

1 Answers1

0

There is no tab in your CSV. Change the delimiter.

import pandas as pd

# https://stackoverflow.com/a/19633103/20307768
# '\s+': it says to expect one or more spaces. the matches will be as large as possible.
df1 = pd.read_csv('41586_2022_4496_MOESM3_ESM.txt', delimiter='\s+')
df1.head(2)
Constantin Hong
  • 701
  • 1
  • 2
  • 16