0

I have converted a pdf file to a text file. This text file is also converted to a csv file. My Problem is the contents in the csv file is written in multiple columns(A,B,C,D,E) whereas I wanted to write it in only one column ie Column A. How could i write the contents from these columns into only one column?

I've tried using merge function and concatenate function and join function but it was of no help.

here's my code

import os.path
import csv
import pdftotext
#Load your PDF
with open("crimestory.pdf", "rb") as f:
   pdf = pdftotext.PDF(f)

# Save all text to a txt file.
with open('crimestory.txt', 'w') as f:
    f.write("\n\n".join(pdf))

save_path = "/home/mayureshk/PycharmProjects/NLP/"

completeName_in = os.path.join(save_path, 'crimestory' + '.txt')
completeName_out = os.path.join(save_path, 'crimestoryycsv' + '.csv')

file1 = open(completeName_in)
In_text = csv.reader(file1, delimiter=',')

file2 = open(completeName_out, 'w')
out_csv = csv.writer(file2)

file3 = out_csv.writerows(In_text)

file1.close()
file2.close()

The expected output in the csv file should be Column A All information. Rest of the columns Empty

cerebral_assassin
  • 212
  • 1
  • 4
  • 16
  • Are you trying to produce something like `pd.melt()` would do or something like [this](https://stackoverflow.com/a/35850749/4350650) answer ? – Mayeul sgc Sep 23 '19 at 11:29
  • sort off. for eg. the data in csv file is scattered like its everywhere from Column A to Column F. I want the data to be written only on column A itself. @Mayeulsgc – cerebral_assassin Sep 23 '19 at 11:35
  • But do you want to keep the data of column A in a single cell or put Col A, Col B and Col C directly in in Col A, becoming (A+B+C) ? – Mayeul sgc Sep 23 '19 at 11:52
  • I want to keep it directly in Column A. Becoming (A+B+C) – cerebral_assassin Sep 23 '19 at 11:58

1 Answers1

1

You can use this answer to merge all columns in one.

#dummy df 
df =pd.DataFrame({'ColA':['value_A1','value_A2','value_A3','value_A4'],'ColB':['value_B1','value_B2','value_B3','value_B4'],'ColC':['value_C1','value_C2','value_C3','value_C4']})

I'll use pandas to load your csv:

import pandas as pd
df= pd.read_csv(sep=',',savepath+'crimestorycsv.csv')
df = df.astype(str)
col = df.columns
df['All'] = df[col[0]].str.cat(df[col[1:]],sep='|')
df.drop(col,axis=1,inplace=True)

Results :

All
0 value_A1|value_B1|value_C1
1 value_A2|value_B2|value_C2
2 value_A3|value_B3|value_C3
3 value_A4|value_B4|value_C4

Mayeul sgc
  • 1,964
  • 3
  • 20
  • 35
  • I kindoff checked it by saving it to the file by using to_csv method just to ensure everything is right but the solution which you provided doesnt work at all. its not concatenating and its adding new column with their indexes(which i never wanted) – cerebral_assassin Sep 24 '19 at 10:43
  • it is working on my dummy dataframe, i'll update adding my dataframe and the given results – Mayeul sgc Sep 24 '19 at 12:16