3

l have 7z files that l want to transform them into csv using Pandas to preprocess the data. l have python 2.7.

l tried this one :

import pandas as pd
data = pd.read_csv('train_2011_2012_2013.7z.002', header = None)
print data

l got this error

CParserError                              Traceback (most recent call last)
<ipython-input-9-74098fd0c476> in <module>()
      1 
----> 2 data = pd.read_csv('train_2011_2012_2013.7z.001', header = None)
      3 print data

/root/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    560                     skip_blank_lines=skip_blank_lines)
    561 
--> 562         return _read(filepath_or_buffer, kwds)


CParserError: Error tokenizing data. C error: Expected 1 fields in line 17, saw 2

What's wrong around here ?

heisen
  • 47
  • 1
  • 5
  • `pd.read_csv()` can take a file handle or `StringIO`. So if you can open a file and read it, then you can pass it to pandas. – chrisaycock Dec 16 '16 at 20:31
  • it doesn't work – heisen Dec 16 '16 at 20:34
  • data = pd.read_csv('train_2011_2012_2013.7z.002', header = None) print data – heisen Dec 16 '16 at 20:34
  • That won't work because you are passing the file's *name*. You need to pass the file's *handle*. See [this question](http://stackoverflow.com/q/32797851/478288) for how to open the file. – chrisaycock Dec 16 '16 at 20:35
  • To be clear, you can't use the file's name when reading in compressed files. Otherwise (if reading in an uncompressed CSV file), the file name works just fine as the first argument to `pandas.read_csv`. –  Jan 04 '17 at 16:47
  • compressed AND multipart files! pandas is good but still :) – Jean-François Fabre Jan 04 '17 at 19:40

1 Answers1

0

Install pyunpack and patool

pip install pyunpack

pip install patool

after that write run the following code:

from pyunpack import Archive
Archive('Downloads\asdfg.7z').extractall("output path")

in the output path you will find the extracted folder in which your files are stored.

cherry
  • 336
  • 1
  • 2
  • 9