1

I am trying to import a .dat file into a pandas dataframe for analysis.

A row in the .dat file contains 2 observations (year, population, and crime) and looks like this:

1960  179323175  3384200  1961  182992000  3488000

Marcin's solution was very helpful, however, I seem to have multiple observations on one row (as the .dat file is structured). Is there an equivalent to the @@ option in SAS, that allows pandas to specify the number of columns (or a better solution)? Thank you.

#importing .dat files into pandas
with open('Data_Exercises/CHAPTER4/DATA for Exercises 4.1 and 4.4.dat','r') as f:
    #next(f) # this is what you would write to skip the first row.skip first row
    df = pd.DataFrame((l.rstrip().split() for l in f))

print(df)

This is a view of the printout. I am not allowed to embed images yet.

mustaccio
  • 18,234
  • 16
  • 48
  • 57
user2529589
  • 330
  • 4
  • 16
  • 1
    You might want to explain what "the @@ option in SAS" does that you can't do in Python; I suspect many Python experts are not familiar with SAS syntax. – mustaccio Mar 22 '19 at 02:32
  • Thank you, mustaccio. The @@ option will allow for one row of data to contain multiple entries. It uses the inputs (i.e. column names) to know how many observations are included (e.g. if there are 3 inputs and 9 pieces of delimited data, then there are 3 observations). – user2529589 Mar 22 '19 at 02:36
  • Probably it's just me, but this explanation doesn't help much. Regardless, you should edit your question and add it there, not in the comments. – mustaccio Mar 22 '19 at 02:49
  • You seem to be looking for a way to produce a union of columns 0-2 and 3-5. – mustaccio Mar 22 '19 at 02:51

1 Answers1

1
pd.concat([df.iloc[:, :3], df.iloc[:, 3:]], axis=0, ignore_index=True)
BallpointBen
  • 9,406
  • 1
  • 32
  • 62